How to measure scalability in a distributed system - scalability

Are there in the literature some standard scalability measures for distributed systems? I'm searching in google (and also google scholar) but I came up with only few papers (e.g., https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=862209).
In particular, I was wondering if there are some scalability measures for the three axes of the AKF cube or cube of the scalability (http://microservices.io/articles/scalecube.html), which is described in the book The Art of Scalability, by Abbott and Fischer.

There is no such unit for scalability. However, it is often illustrated by chart having the amount of resources in X-axis and throughput or latency in the Y-axis.

Related

AKF Scale cube alternatives?

The AKF scale cube is a qualitative mean to measure the scalability of a system.
This model has been introduced in the book the "Art of scalability". You can find a succinct description here.
I am wondering if there are alternatives to the scale cube, to assess qualitatively the scalability of a system.
(In case this question is off-topic, let me know if there are better suited places for this kind of questions).
In addition to AKF, there is Transactional Auto Scaler (TAS), Scalability Patterns Language Map(SPLM) and Person Achieved Ubiquitous Manufacturing(PAUM)...
This pdf describes the assessment models (Quantitative/Qualitative) of scalability, in different areas such as: Manufacturing Systems, Computer Science, Organizational and Business.
Edit1
If the above models do not measure or at least help measure -I think-, please consider this researche
to measure the scalability which discusses several technics.
Scalability can be measured in various dimensions, such as:
Administrative scalability: The ability for an increasing number of organizations or users to easily share a single distributed system.
Functional scalability: The ability to enhance the system by adding new functionality at minimal effort.
Geographic scalability: The ability to maintain performance, usefulness, or usability regardless of expansion from concentration in a local area to a more distributed geographic pattern.
Load scalability, And so on..
AKF looks like a model not an approach to measure scalability, from the definition:
"The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed and when existing systems are being improved. "

Developing a web bot crawler system after clustering the bots

I am trying to identifying high hitting IP's over a duration of time.
I have performed clustering on certain features, got a 12 cluster output, out of which 8 were bots and 4 were humans, as per the centroid values of the cluster.
Now What technique can I use to analyze the data within the cluster, so as to know that the data points within the cluster are in the right clusters.
In other words, are there any statistical methods to check the quality of the clusters.?
What I can think of is , if I take a data point which is at the boundary of the cluster, If I measure the distance of this point from the other Centroids and from its own Centroid, then can I get to know how close the two clusters are to my point and may be how well are my data divided in cluster ??
Kindly guide how to measure the quality of my clusters, with respect to data points and what are the standard technique to do so.
Thanks in Advance.!!
Cheers.!
With k-means, the chances are that you already have a big heap of garbage. Because it is an incredibly crude heuristic, and unless you were extremely careful at designing your features (at which point you would already know how to check the quality of a cluster assignment) the result is barely better than choosing a few centroids at random. In particular with k-means, which is very sensitive to the scale of your features. The results are very unreliable if you have features of different types and scale (e.g. height, shoe size, body mass, BMI: k-means on such variables is statistical nonsense).
Do not dump your data into a clustering algorithm and expect to get something useful. Clustering follows the GIGO principle: garbage-in-garbage-out. Instead, you need to proceed as follows:
identify what is a good cluster in your domain. This is very data and problem dependant.
choose a clustering algorithm with a very simialar objective.
find a data transformation, distance function or modification of the clustering algorithm to align with your objective
carefully double-check the result for trivial, unwanted, biased and random solutions.
For example, if you blindly threw customer data into clustering algorithm, the chances are it will decide the best answer to be 2 clusters, corresponding to the attributes "gender=m" and "gender=f"simply because this is the most extreme factor in your data. But because this is a know attribute, this result is entirely useless.

Use Digital Signal Processors to accelerate calculations in the same fashion than GPUs

I read that several DSP cards that process audio, can calculate very fast Fourier Transforms and some other functions involved in Sound processing and others. There are some scientific problems (not many), like Quantum mechanics, that involver Fourier Transform calculations. I wonder if DSP could be used to accelerate calculations in this fashion, like GPUs do in some other cases, and if you know succcessful examples.
Thanks
Any linear operations are easier and faster to do on DSP chips. Their architecture allows you to perform a linear operation (take two numbers, multiply each of them by a constant and add the results) in a single clock cycle. This is one of the reasons FFT can be calculated so quickly on a DSP chip. This is also a reason many other linear operations can be accelerated with their use. I guess I have three main points to make concerning performance and code optimization for such processors.
1) Perhaps less relevant, but I'd like to mention it nonetheless. In order to take full advantage of DSP processor's architecture, you have to code in Assembly. I'm pretty sure that regular C code will not be fully optimized by the compiler to do what you want. You literally have to specify each register, etc. It does pay off, however. The same way, you are able to make use of circular buffers and other DSP-specific things. Circular buffers are also very useful for calculating the FFT and FFT-based (circular) convolution.
2) FFT can be found in solutions to many problems, such as heat flow (Fourier himself actually came up with the solution back in the 1800s), analysis of mechanical oscillations (or any linear oscillators for that matter, including oscillators in quantum physics), analysis of brain waves (EEG), seismic activity, planetary motion and many other things. Any mathematical problem that involves convolution can be easily solved via the Fourier transform, analog or discrete.
3) For some of the applications listed above, including audio processing, other transforms other than FFT are constantly being invented, discovered, and applied to processing, such as Mel-Cepstrum (e.g. MPEG codecs), wavelet transform (e.g. JPEG2000 codecs), discrete cosine transform (e.g. JPEG codecs) and many others. In quantum physics, however, the Fourier Transform is inherent in the equation of angular momentum. It arises naturally, not just for the purposes of analysis or easy of calculations. For this reason, I would not necessarily put the reasons to use Fourier Transform in audio processing and quantum mechanics into the same category. For signal processing, it's a tool; for quantum physics, it's in the nature of the phenomenon.
Before GPUs and SIMD instruction sets in mainstream CPUs this was the only way to get performance for some applications. In the late 20th Century I worked for a company that made PCI cards to place extra processors in a PCI slot. Some of these were DSP cards using a TI C64x DSP, others were PowerPC cards to provide Altivec. The processor on the cards would typically have no operating system to give more predicatable real-time scheduling than the host. Customers would buy an industrial PC with a large PCI backplace, and attach multiple cards. We would also make cards in form factors such as PMC, CompactPCI, and VME for more rugged environments.
People would develop code to run on these cards, and host applications which communicated with the add-in card over the PCI bus. These weren't easy platforms to develop for, and the modern libraries for GPU computing are much easier.
Nowadays this is much less common. The price/performance ratio is so much better for general purpose CPUs and GPUs, and DSPs for scientific computing are vanishing. Current DSP manufacturers tend to target lower power embedded applications or cost sensitive high volume devices like digital cameras. Compare GPUFFTW with these Analog Devices benchmarks. The DSP peaks at 3.2GFlops, and the Nvidia 8800 reachs 29GFlops.

Reading materials for scalability analysis?

I'm looking for educational material on the subject of scalability analysis. I'm not simply looking for Big-O analysis, but for material on approaches and techniques for analysis of the scalability of large scale transactional systems. Amazon's orders and payment systems might be good examples of the sort of systems I'm referring to.
I have a preference for online materials, including text and video, in that they tend to be easily accessible but I'm open to book suggestions, too.
highscalability blog, for real life issues

Reasons for NOT scaling-up vs. -out?

As a programmer I make revolutionary findings every few years. I'm either ahead of the curve, or behind it by about π in the phase. One hard lesson I learned was that scaling OUT is not always better, quite often the biggest performance gains are when we regrouped and scaled up.
What reasons to you have for scaling out vs. up? Price, performance, vision, projected usage? If so, how did this work for you?
We once scaled out to several hundred nodes that would serialize and cache necessary data out to each node and run maths processes on the records. Many, many billions of records needed to be (cross-)analyzed. It was the perfect business and technical case to employ scale-out. We kept optimizing until we processed about 24 hours of data in 26 hours wallclock. Really long story short, we leased a gigantic (for the time) IBM pSeries, put Oracle Enterprise on it, indexed our data and ended up processing the same 24 hours of data in about 6 hours. Revolution for me.
So many enterprise systems are OLTP and the data are not shard'd, but the desire by many is to cluster or scale-out. Is this a reaction to new techniques or perceived performance?
Do applications in general today or our programming matras lend themselves better for scale-out? Do we/should we take this trend always into account in the future?
Because scaling up
Is limited ultimately by the size of box you can actually buy
Can become extremely cost-ineffective, e.g. a machine with 128 cores and 128G ram is vastly more expensive than 16 with 8 cores and 8G ram each.
Some things don't scale up well - such as IO read operations.
By scaling out, if your architecture is right, you can also achieve high availability. A 128-core, 128G ram machine is very expensive, but to have a 2nd redundant one is extortionate.
And also to some extent, because that's what Google do.
Scaling out is best for embarrassingly parallel problems. It takes some work, but a number of web services fit that category (thus the current popularity). Otherwise you run into Amdahl's law, which then means to gain speed you have to scale up not out. I suspect you ran into that problem. Also IO bound operations also tend to do well with scaling out largely because waiting for IO increases the % that is parallelizable.
The blog post Scaling Up vs. Scaling Out: Hidden Costs by Jeff Atwood has some interesting points to consider, such as software licensing and power costs.
Not surprisingly, it all depends on your problem. If you can easily partition it with into subproblems that don't communicate much, scaling out gives trivial speedups. For instance, searching for a word in 1B web pages can be done by one machine searching 1B pages, or by 1M machines doing 1000 pages each without a significant loss in efficiency (so with a 1,000,000x speedup). This is called "embarrassingly parallel".
Other algorithms, however, do require much more intensive communication between the subparts. Your example requiring cross-analysis is the perfect example of where communication can often drown out the performance gains of adding more boxes. In these cases, you'll want to keep communication inside a (bigger) box, going over high-speed interconnects, rather than something as 'common' as (10-)Gig-E.
Of course, this is a fairly theoretical point of view. Other factors, such as I/O, reliability, easy of programming (one big shared-memory machine usually gives a lot less headaches than a cluster) can also have a big influence.
Finally, due to the (often extreme) cost benefits of scaling out using cheap commodity hardware, the cluster/grid approach has recently attracted much more (algorithmic) research. This makes that new ways of parallelization have been developed that minimize communication, and thus do much better on a cluster -- whereas common knowledge used to dictate that these types of algorithms could only run effectively on big iron machines...

Resources