Method to measure P-Scalability in Distributed Systems - scalability

P-Scalbility measure is method to measure the scalability of distributed systems. I have tried hard to find its formula in google. But could not find it clearly on any place.
If any one know can please explain or provide the formula details or resources from where i can understand this formula.
Thanks in anticipation.

Firstly, did you find an answer in the mean-time?
If not here is what I found so far:
I found a formula and a brief explanation of the p-scalability in the paper Evaluating the Scalability of Distributed Systems of this paper.
The reference 11 is:
P.P. Jogalekar and C.M. Woodside, aA Scalability Metric for
Distributed Computing Applications in Telecommunications,o
Proc. 15th Int'l Teletraffic Congress Teletraffic --- Contributions to the
Information Age, pp. 101-110, 1997.
For which I did not manage to find a free available version online. But is is written by the same authors of Evaluating the scalability of Distributed Systems. With an university account you could retrieve a copy.
If I find further details, I will edit the answer.
I reference also a related question asked by me

Related

Tips for writing an algorithm for paraphrasing sentences(machine learning)

I am doing a project at the university and I need to train an algorithm to rephrase sentences, what can you advise for implementation? Is it possible to use a translator to translate into another language in the end to get a paraphrased sentence? Also i want to use Word2Vec, or it's a bad idea?
This kind of broad-advice question – and about a very-tough problem, paraphrasing text, that is still a very active research problem – would be better answered by surveyin the research literature.
A great site for searching relevant papers – and then finding other related papers once you've set some positive examples – is http://www.arxiv-sanity.com/.
Searching for [paraphrasing] or [summarization] would give you a running start in seeing major techniques & their limitations. And, once you start bookmarking papers by the little 'disk' icon, it can autosuggest important related papers... so even if your 1st few finds are tangential or far-from-usefulness, it can lead you to the seminal papers, & prevailing cutting-edge algorithms/libraries, pretty quickly.

Down-to-earth introduction to time-series for a programmer

I'm a programmer who is interested in processing and analyzing time-series data. I know basic statistics and math, but I'm afraid that's all.
Can you please recommend good books and/or articles that does not require Ph.D. to understand them?
As for my concrete tasks - I want to be able to spot trends, eliminate outliers, be able to make predictions and calculate stats over a range of values. We have quite a bit of events coming off our systems.
I started reading "Introduction to Time Series and Forecasting" by Brockwell and Davis - and I'm completely lost in math.
update on outliers by outliers I mean data points that doesn't necessarily make sense. e.g. the exchange rate is 1.5$(+-10 cents) for a pound on average, but a guy around the corner offers 1.09$ and says he's completely legit.
I've found the NIST Engineering Statistics Handbook's chapter on time series to be a simple and clear introduction to basic time series modeling. It discusses exponential smoothing, auto-regressive, moving average, and eventually ARMA time series modeling. These can be used for trend analysis and possibly prediction, subject to validation.
Outlier/anomaly detection is a much different task; the NIST book doesn't have much on this. It would be helpful to know what kind of outliers you are trying to detect.
I've gone through numerous books and articles and here are my findings. May be they will help others like me.
Regarding theory - I found an article An Introductory Study on Time Series Modeling and Forecasting very well written. That doesn't mean I understood all of its contents, but it's a really good overview of available time series models.
If you're like me and like to see some actual code - there's article series on QuantStart. Examples are in R, but I guess many of them are portable to Python.
I can highly recommend QuantStart blog by Michael Halls-Moore, I found articles easy to read and the author has done a great job trying not to overwhelm a reader with math. I also read Michael's first book and it's a good one for a beginner in the space like me.
Textbooks on the topic are extremely hard for me to read. I tried Time Series Analysis by Hamilton, but haven't gotten far.
Regarding outlier detection I mentioned - I've found this question on SO and its stats counterpart. By the looks of it, it's not something you can study and implement in a couple of evenings, at least not for me.

How to extract features from fmri?

I'm having fmri dataset for the classification of Normal Controls and Alzheimer diseased patients. Now, as a newbie I'm unable to extract features from my dataset. I want to extract activation patterns, GM,WM, CSF, volumetric measures and hemo-dynamics in numerical form. Please guide me how and where to start from and please suggest some easy and efficient softwares for my work... I'll be obliged...
Take a look at the software packages called FSL (FMRIB Software Library) and SPM (Statistical Parametric Mapping).
Each of them can do the kind of analyses you're asking about. However, be warned that none of these analyses are trivial. You should probably read up a bit on the subject, first. The Handbook of Functional MRI Data Analysis is a great place to start for beginners.
Like #WeirdAlchemy says, these are many analyses you want to carry out, and all of them non-trivial. You typically learn to these over weeks at a relevant intensive course or months during a neuro Masters programme. To answer your question very explicitly:
GM, WM & CSF volumetric measures - You can do this with FSL SIENA, SPM VBM, AFNI 3Dclust, among others.
"Extract activation patterns" is too vague. In all probability, you likely have task-related BOLD fMRI data and want to perform a general linear model (GLM) analysis. FSL FEAT, SPM fMRI, AFNI and others support this. However, without knowing the experimental design, the nature of the data, and what you want to learn from it, it's hard to be more specific about which tool is appropriate.
"Haemodynamics in numerical form" This can mean a number of things, but if you are thinking about the amount of haemodynamic signal modulation (e.g. Condition led to a 2% change in BOLD signal), you get that out of the GLM analysis mentioned above.

Increasing the efficiency of equipment using Amazon Machine Learning

The problem statement is kind of vague but i am looking for directions because of privacy policy i can't share exact details. so please help out.
We have a problem at hand where we need to increase the efficiency of equipment or in other words decide on which values across multiple parameters should the machines operate to produce optimal outputs.
My query is whether it is possible to come up with such numbers using Linear Regression or Multinomial Logistic Regression algorithms, if no then can you please specify which algorithms will be more suitable. Also can you please point me to some active research done on this kind of problem that is available in public domain.
Does the type of problem i am asking suggestions for comes in the area of Machine Learning ?
Lots of unknowns here but I’ll make some assumptions.
What you are attempting to do could probably be achieved with multiple linear regression. I have zero familiarity with the Amazon service (I didn’t even know it existed until you brought this up, it’s not available in Europe). However, a read of the documentation suggests that the Amazon service would be capable of doing this for you. The problem you will perhaps have is that it’s geared to people unfamiliar with this field and a lot of its functionality might be removed or clumped together to prevent confusion. I am under the impression that you have turned to this service because you too are somewhat unfamiliar with this field.
Something that may suit your needs better is Response Surface Methodology (RSM), which I have applied to industrial optimisation problems that I think are similar to what you suggest. RSM works best if you can obtain your data through an experimental design such as a Central Composite Design or Box-Behnken design. I suggest you spend some time Googling these terms to get your head around them, I don’t think it’s an unmanageable burden to learn how to apply these with no prior experience in this area. Because your question is vague, only you can determine if this really is suitable. If you already have the data in an unstructured format, you can still generate an RSM but it is less robust. There are plenty of open-access articles using these techniques but Science Direct is conveniently down at the moment!
Minitab is a software package that will do all the regression and RSM for you. Its strength is that it has a robust GUI and partially reflects Excel so it is far less daunting to get into than something like R. It also has plenty of guides online. They offer a 30 day free trial so it might be worth doing some background reading, collecting the tutorials you need and develop a plan of action before downloading the trial.
Hope that is some help.

Machine learning/information retrieval project

I’m reading towards M.Sc. in Computer Science and just completed first year of the source. (This is a two year course). Soon I have to submit a proposal for the M.Sc. Project. I have selected following topic.
“Suitability of machine learning for document ranking in information retrieval system”. Researchers have been using various machine learning algorithms for ranking documents. So as the first phase of the project I will be doing a complete literature survey and finding out advantages/disadvantages of current approaches. In the second phase of the project I will be proposing a new (modified) algorithm in order to overcome the limitations of current approaches.
Actually my question is whether this type of project is suitable as a M.Sc. project? Moreover if somebody has some interesting idea in information retrieval filed, is it possible to share those ideas with me.
Thanks
Ranking is always the hardest part of any of Information Retrieval systems. I think it is a very good topic but you have to take care to -- as soon as possible -- to define a scope of the work. Probably you will not be able to develop a new IR engine but rather build a prototype based on, e.g., apache lucene.
Currently there is a lot of dataset including stackoverflow data dump, which provide you all information you need to define a rich feature vector (number of points, time, you can mine topics of previous question etc., popularity of a tag) for you machine learning ranking algorithm. In this part of the work you could, e.g., classify types of features (e.g., user specific, semantic feature - software name in the title) and perform series of experiments to learn which features are most important and which are not for a given dataset.
The second direction of such a project can be how to perform learning efficiently. The reason behind is the quantity of data within web or community forums and changes in the forum (this would be important if you take a community specific features), e.g., changes in technologies, new software release, etc.
There are many other topics related to search and machine learning. The best idea is to search on scholar.google.com for the recent survey papers on ranking, machine learning, and search to learn what is the state-of-the-art. The very next step would be to talk with your MSc supervisor.
Good luck!
Everything you said is good and should be done, but you forgot the most important part:
Prove that your algorithm is better and/or faster than other algorithms, with good experiments and maybe some statistics (p-value, confidence interval).
If you do that and convince people that your algorithm is useful you surely will not fail :)

Resources