I'm looking for educational material on the subject of scalability analysis. I'm not simply looking for Big-O analysis, but for material on approaches and techniques for analysis of the scalability of large scale transactional systems. Amazon's orders and payment systems might be good examples of the sort of systems I'm referring to.
I have a preference for online materials, including text and video, in that they tend to be easily accessible but I'm open to book suggestions, too.
highscalability blog, for real life issues
Related
The AKF scale cube is a qualitative mean to measure the scalability of a system.
This model has been introduced in the book the "Art of scalability". You can find a succinct description here.
I am wondering if there are alternatives to the scale cube, to assess qualitatively the scalability of a system.
(In case this question is off-topic, let me know if there are better suited places for this kind of questions).
In addition to AKF, there is Transactional Auto Scaler (TAS), Scalability Patterns Language Map(SPLM) and Person Achieved Ubiquitous Manufacturing(PAUM)...
This pdf describes the assessment models (Quantitative/Qualitative) of scalability, in different areas such as: Manufacturing Systems, Computer Science, Organizational and Business.
Edit1
If the above models do not measure or at least help measure -I think-, please consider this researche
to measure the scalability which discusses several technics.
Scalability can be measured in various dimensions, such as:
Administrative scalability: The ability for an increasing number of organizations or users to easily share a single distributed system.
Functional scalability: The ability to enhance the system by adding new functionality at minimal effort.
Geographic scalability: The ability to maintain performance, usefulness, or usability regardless of expansion from concentration in a local area to a more distributed geographic pattern.
Load scalability, And so on..
AKF looks like a model not an approach to measure scalability, from the definition:
"The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed and when existing systems are being improved. "
The problem statement is kind of vague but i am looking for directions because of privacy policy i can't share exact details. so please help out.
We have a problem at hand where we need to increase the efficiency of equipment or in other words decide on which values across multiple parameters should the machines operate to produce optimal outputs.
My query is whether it is possible to come up with such numbers using Linear Regression or Multinomial Logistic Regression algorithms, if no then can you please specify which algorithms will be more suitable. Also can you please point me to some active research done on this kind of problem that is available in public domain.
Does the type of problem i am asking suggestions for comes in the area of Machine Learning ?
Lots of unknowns here but I’ll make some assumptions.
What you are attempting to do could probably be achieved with multiple linear regression. I have zero familiarity with the Amazon service (I didn’t even know it existed until you brought this up, it’s not available in Europe). However, a read of the documentation suggests that the Amazon service would be capable of doing this for you. The problem you will perhaps have is that it’s geared to people unfamiliar with this field and a lot of its functionality might be removed or clumped together to prevent confusion. I am under the impression that you have turned to this service because you too are somewhat unfamiliar with this field.
Something that may suit your needs better is Response Surface Methodology (RSM), which I have applied to industrial optimisation problems that I think are similar to what you suggest. RSM works best if you can obtain your data through an experimental design such as a Central Composite Design or Box-Behnken design. I suggest you spend some time Googling these terms to get your head around them, I don’t think it’s an unmanageable burden to learn how to apply these with no prior experience in this area. Because your question is vague, only you can determine if this really is suitable. If you already have the data in an unstructured format, you can still generate an RSM but it is less robust. There are plenty of open-access articles using these techniques but Science Direct is conveniently down at the moment!
Minitab is a software package that will do all the regression and RSM for you. Its strength is that it has a robust GUI and partially reflects Excel so it is far less daunting to get into than something like R. It also has plenty of guides online. They offer a 30 day free trial so it might be worth doing some background reading, collecting the tutorials you need and develop a plan of action before downloading the trial.
Hope that is some help.
While my research area is in Machine Learning (ML), I am required to take a project in Programming Languages (PL). Therefore, I'm looking to find a project that is inclined towards ML.
One intersection I know of between the two fields is Natural Language Processing (NLP), but I couldn't find concrete papers in that topic that are related to PL; perhaps due to my poor choice of keywords in the search query.
The main topics in the PL course are : Syntax & Symantics, Static Program Analysis, Functional Programming, and Concurrency and Logic programming
If you could suggest papers or keywords that are Machine Learning enthusiast friendly, that would be highly appreciated!
Another very important intersection in these fields is probabilistic programming languages, which provide probabilistic inference over models specified as actual computer programs. It's a growing research field, including a recently started DARPA program on this topic.
If you are interested in NLP, then I would focus on two aspects of listed PL disciplines:
Syntax & Semantics - as this is incredibly closely realted to the NLP field, where in most cases the understanding is based on the various language grammars. Searching for papers regarding language modeling, information extraction, deep parsing would yield dozens of great research topics which are heavil related to the sytax/semantics problems.
logic programming -"in good old years" people believed that this is a future of AI, even though it is not (currently) true, it is still quite widely used forreasoning in some fields. In particular, prolog is a good example of language that can be used to reson (for example spatial-temporal reasoning) or even parse language (due to its "grammar like" productions).
If you wish to tackle some more ML related problem rather then NLP then you could focus on concurrency (parallelism) as it is very hot topic - making ML models more scalable, more efficient, "bigger, faster, stronger" ;) Just lookup keywords like GPU Machine Learning, large scale machine learning, scalable machine learning etc.
I also happen to know that there's a project at the University of Edinburgh on using machine learning to analyse source code. Here's the first publication that came out of it
I am trying to build a recommender system which would recommend webpages to the user based on his actions(google search, clicks, he can also explicitly rate webpages). To get an idea the way google news does it, it displays news articles from the web on a particular topic. In technical terms that is clustering, but my aim is similar. It will be content based recommendation based on user's action.
So my questions are:
How can I possibly trawl the internet to find related web-pages?
And what algorithm should I use to extract data from web-page is textual analysis and word frequency the only way to do it?
Lastly what platform is best suited for this problem. I have heard of Apache mahout and it comes with some re-usable algos, does it sound like a good fit?
as Thomas Jungblut said, one could write several books on your questions ;-)
I will try to give you a list of brief pointers - but be aware there will be no ready-to-use off-the-shelf solution ...
Crawling the internet: There are plenty of toolkits for doing this, like Scrapy for Python , crawler4j and Heritrix for Java, or WWW::Robot for Perl. For extracting the actual content from web pages, have a look at boilerpipe.
http://scrapy.org/
http://crawler.archive.org/
http://code.google.com/p/crawler4j/
https://metacpan.org/module/WWW::Robot
http://code.google.com/p/boilerpipe/
First of all, often you can use collaborative filtering instead of content-based approaches. But if you want to have good coverage, especially in the long tail, there will be no way around analyzing the text. One thing to look at is topic modelling, e.g. LDA. Several LDA approaches are implemented in Mallet, Apache Mahout, and Vowpal Wabbit.
For indexing, search, and text processing, have a look at Lucene. It is an awesome, mature piece of software.
http://mallet.cs.umass.edu/
http://mahout.apache.org/
http://hunch.net/~vw/
http://lucene.apache.org/
Besides Apache Mahout which also contains things like LDA (see above), clustering, and text processing, there are also other toolkits available if you want to focus on collaborative filtering: LensKit, which is also implemented in Java, and MyMediaLite (disclaimer: I am the main author), which is implemented in C#, but also has a Java port.
http://lenskit.grouplens.org/
http://ismll.de/mymedialite
https://github.com/jcnewell/MyMediaLiteJava
This should be a good read: Google news personalization: scalable online collaborative filtering
It's focused on collaborative filtering rather than content based recommendations, but it touches some very interesting points like scalability, item churn, algorithms, system setup and evaluation.
Mahout has very good collaborative filtering techniques, which is what you describe as using the behaviour of the users (click, read, etc) and you could introduce some content based using the rescorer classes.
You might also want to have a look at Myrrix, which is in some ways the evolution of the taste (aka recommendations) portion of Mahout. In addition, it also allows applying content based logic on top of collaborative filtering using the rescorer classes.
If you are interested in Mahout, the Mahout in Action book would be the best place to start.
I have never been interested in optimisation. Although almost all of my professors are in it. So I have been given few subjects, which are to be used in my thesis (it is a good word?). One of them. The result should be an application. So I'm looking for an interesting metaheuristic, evolutionary algorithm, ..., that is not too hard to understand and has various usages. Maybe someone has some experience?
The topics are:
Differential evolution algorithms
Coevolution in metaheuristics
algorithms
Multi ojective evolutionary
algorithms
...
From my experience, here are some metaheuristic algorithms, ordered from easy to hard to learn and their results (again, in my experience):
Hill climbing - bad results
Tabu Search - good results
Great Deluge - bad results
Genetics algorithms - medium results
Simulated Annealing - very good results (if you manage to implement it correctly)