I'm having some difficulties thinking about a good way to assess the scalability of my school project. The assignment is simply put a twitter-service. Very bare bones. The main goal is to make it as scalable as possible. However, it's equally important to know how and why the assignment would scale. So the point is not really to create a very scalable project, but mainly to create a few different architectures for the server and see which one outperforms which one and why.
So so far I have the basic server architecture like this:
* 1 main process which holds the data
* 1 unique process server-side per user
A user sends messages to his own server-side process, which then simply delegates those messages to the central process.
To test this I would spawn 1 or more processes which would act as clients. I would spam the server with tweets and then assess how well it can withstand a certain load.
Now, to assess the scalability I came up with following metrics:
First off, a process is being a bottleneck if it's message queue is piling up. So, I would store the queuelength every time a tweet is processed (or every N tweets) and in the end calculate the average. If I run it on more cores and the average queue length goes down, it scales better.
Second, if I create N users to spam the server (on N processes or less) I simply time how long it takes for the server to process all these tweets.
Is there a better way to do this? I can't stop thinking that there should be better metrics..
Edt:
So far I have tried fprof and eprof. These tools however, show me how much time is spent in certain methods. While this is a good indicator where I can improve my code, it's not really a good indicator for scalability. It would be better if it would for example show the time spent per process.
Look at percept and percept2 if you are really interested in this.
Related
For my internship, I need to implement a blockchain based solution to manage a drug supply chain. The management of this supply chain implies to track-and-trace (geolocate) a drug on the chain, but also to monitor the storage temperature to see if the cold chain is respected. For that I also intend to use IOT, where a device will feed information on the blockchain solution. However, I have a few questions that I can't find easier.
The first one is that I don't know if I should use ethereum or not, since each time that a new block is added (the block representing the update on the information about the product on "real-time") I will to use money. Is there any solution for that? Or do I need to create a blockchain with javascript?
The second question is that I absolutely don't know from where to begin in order to implement IOT on the block chain. I searched on research site, but they only talk about it, without presenting any example...
The third one is more a confirmation than a question since I want to know if my idea to use an IOT to track and manage products on a supply chain can be done on a wide scale, since the bigger a blockchain the slower is the time to add a block because of the consensus mechanism. So it means that my "real time" tracking on truly be "on time" since there would be a waiting time before the block is added to the blockchain. If the time is just a few seconds to minutes, then there is no problem, but because the number of block will rapidly increase because of the real time tracking (1 block each minute for each storage or transport vehicles I was planning for) that this problem of scalability makes it impractible.
I'm thanking anybody in advance who will help me solve these questions.
#1) Whether you use Ethereum or another blockchain that may be better suited for this purpose is completely up to you. I expect you will get a lot of opinionated answers on this. Ethereum is certainly the most popular blockchain for a use like this, but that does not mean it is the best for your app. Over the last several years we have seen many new blockchains with lower/no fees, faster block times, and increased scalability. I would suggest doing some research into various "Supply Chain", "Enterprise", and "Business" blockchains, as these are likely the type you are looking for and will cost very little in blockchain fees due to them not being as widely used as Ethereum is.
#2) You will have to settle on a blockchain before you can start prototyping or looking for examples, as each one will be different. For storing "log" data for your application there are generally 2 options: Storing data in a smart contract on Ethereum (or an Ethereum-like blockchain), or storing data in the OP_RETURN field of a transaction on Bitcoin or a bitcoin-based blockchain. The latter is likely easier to get started with and is simpler to understand, you just put the data in a transaction and send it (to yourself, even).
#3) Yes, there a special purpose blockchains created exactly for this purpose that are meant to ingest large amounts of data and that can scale to meet the needs of an application like you describe. Some blockchains have block times of 1 minute or less meaning that on average, you could update the data every minute if you were willing and able to pay the blockchain fee to include the data in each new block (personally I would suggest a longer interval, such as every 5-10 minutes).
You can use Emercoin NVS technology and Emercoin (testnet) blockchain to upload you data into it. By creating some "name label" with name_new command, you can thereafter upload chain of modifications for values of your name. Blockchain has command name_show (shows recent uploaded value) and name_history (shows all chain of uploaded values). You can view/debug your uploaded values within Emercoin Testnet Explorer, tab NVS.
Regarding "use money".
I can give you (or anyone else) 100 test EMCs, for free. Just write your tEMC address in comments here. 100tEMCs will be enough for ~100,000 records within Testnet. Thus, I think, it is more than enough for your test tasks.
If you need to use your service for production, you need to use the "real blockchain" (with high trust), no testnet. Anyway, EMCs are very cheap right now, and you can buy 100EMSs for ~$5 only. I think, this is not big deal for your organization.
Ask more questions, and I will be happy to assist you here with this technology.
Because you have a permissioned / private environment that does not transfer value or store value, a blockchain specialised in value transfer, like Ethereum, is not a good choice.
Choosing a blockchain that is specialised in untrusted value writes simplies creating your product a lot. Some good choices include:
BigchainDB
HyperLedger SawTooth - comes with a stock supply chain example
I aim to create a browser game where players can set up buildings.
Each building will have several modules (engines, offices,production lines, ...). Each module will have enentually one or more actions running, like creation of 2OO 'item X' with ingredients Y, Z.
The game server will be set up with erlang : An OTP application as the server itself, and nitrogen as the web front.
I need persistence of data. I was thinking about the following :
When somebody or something interacts with a building, or a timer representing some production line ends up, a supervisor spawns a gen_server (if not already spawned) which loads the state of the building from a database, so the gen_server can answer messages like 'add this module', 'starts this action', 'store this production to warehouse', 'die', etc. (
But when a building don't receive any messages during X seconds or minutes, he will terminate (thanks to the gen_server timeout feature) and drop its current state back to the database.
So, as it will be a (soft) real time game, the gen_server must be set up very fastly. I was thinking of membase as the database, because it's known to have very good response time.
My question is : when a gen server is up an running, his states fills some memory, and this state is present in the memory handled by membase too, so the state use two times his size in memory. Is that a bad design ?
Is membase a good solution to handle persistence in my case ? would be use mnesia a better choice , or something else ?
I fear mnesia 2 Go (or 4 ?) table size limit because i don't know at the moment the average state size of my gen_servers (buildings in this example, butalso players, production lines, whatever) and i may have someday more than 1 player :)
Thank you
I agree with Hynek -Pichi- Vychodil. Riak is a great thing for key-valye storage.
We use Riak almost 95% for the same thing you described. Everything works so far without any issues. In case you will hit performance limitation of Riak - add more nodes and it good to go!
Another cool thing about Riak is its very low performance degradation over the time. You can find more information about benchmarking Riak here: http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/
In case you go with it:
a driver: https://github.com/basho/riak-erlang-client
a connection pool you may need to work with it: https://github.com/dweldon/riakpool
About membase and memory usage: I also tried membase, but I found that it is not suitable for my tasks - (membase declares fault tolerance, but I could not setup it in the way it should work with faults, even with help from membase guys I didn't succeed). So at the moment I use the following architecture: All players that are online and play the game are presented as player-processes (gen_server). All data data and business logic for each player is in its player-process. From time to time each player-process desides to save its state in riak.
So far seems to be very fast and efficient approach.
Update: Now we are with PostgreSQL. It is awesome!
You can look to bitcask or other Riak backends to store your data. Avoid IPC is definitely good idea, so keep it inside Erlang.
I need to implement a draft application for a fantasy sports website. Each users will have 1m30 to choose a player on its team and if that time has elapsed it will be selected automatically. Our planned implementation will use Juggernaut to push the turn changes to each user participating in the draft. But I'm still not sure about how to handle latency.
The main issue here is if a user got a higher latency than the others, he will receive the turn changes a little bit later and his timer won't be synchronized. Say someone receive a turn change after choosing a player himself while on his side he think he still got 2 seconds left, how can we handle that case? Is it better to try to measure each user latency and adjust the client-side timer to minimize that issue? If so, how could we implement that?
This is a tricky issue, but there are some good solutions out there. Look into what time.gov does, and how it does it; essentially, as I understand it, they use Java to perform multiple repeated requests to the server, to attempt to get an idea of the latency involved in the communication, then they generate a measure of latency that they use to skew the returned time data. You could use the same process for your application, with even more accuracy; keeping track of what the latency is and how it varies over time lets you make some statistical inferences about how reliable your latency numbers are, etc. It can be a bit complex, but it can definitely allow you to smooth out your performance. My understanding is that this is what most MMOs do as well, to manage lag.
For security reasons I have a feeling that that testing should be done server side. Nonetheless, that would be rather taxing on the server, right? Given the gear and buffs a player is wearing they will have a higher movement speed, so each time they move I would need to calculate that new constant and see if their movement is legitimate (using TCP so don't need to worry about lost, unordered packets). I realize I could instead just save the last movement speed and only recalculate it if they've changed something affecting their speed, but even then that's another check.
Another idea I had is that the server randomly picks data that the client is sending it and verifies it and gives each client a trust rating. A low enough trust rating would mean every message from the client would be inspected and all of their actions would be logged in a more verbose manner. I would then know they're hacking by inspecting the logs and could ban/suspend them as well as undo any benefits they may have spread around through hacking.
Any advice is appreciated, thank you.
Edit: I just realized there's also the case where a hacker could send tiny movements (within the capability of their regular speed) in a very high succession. Each individual movement alone would be legite, but the cumulative effect would be speed hacking. What are some ways around this?
The standard way to deal with this problem is to have the server calculate all movement. The only thing that the clients should send to the server are commands, e.g. "move left" and the server should then calculate how fast the player moves etc., then finally send the updated position back to the client.
If you leave any calculation at all on the client, the chances are that sooner or later someone will find a way to cheat.
[...] testing should be done server side. Nonetheless, that would be rather taxing on the server, right?
Nope. This is the way to do it. It's the only way to do it. All talk of checking trust or whatever is inherently flawed, one way or another.
If you're letting the player send positions:
Check where someone claims they are.
Compare that to where they were a short while ago. Allow a tiny bit of deviation to account for network lag.
If they're moving too quickly, reposition them somewhere more reasonable. Small errors may be due to long periods of lag, so clients should use interpolation to smooth out these corrections.
If they're moving far too quickly, disconnect them. And check for bugs in your code.
Don't forget to handle legitimate traversals over long distance, eg. teleports.
The way around this is that all action is done on the server. Never trust any information that comes from the client. If anybody actually plays your game, somebody will reverse-engineer the communication to the server and figure out how to take advantage of it.
You can't assign a random trust rating, because cautious cheaters will cheat only when they really need to. That gives them a considerable advantage with a low chance of being spotted cheating.
And, yes, this means you can't get by with a low-grade server, but there's really no other method of preventing client-side cheating.
If you are developing in a language that has access to Windows API function calls, I have found from my own studies in speed hacking, that you can easily identify a speed hacker by calling two functions and comparing results.
TimeGetTime
and...
GetTickCount
Both functions will return the number of seconds since the system started. However, TimeGetTime is much more accurate than GetTickCount, whereas TimeGetTime is accurate up to ~1ms vs. GetTickCount, which is accurate at around ~50ms
Even though there is a small lag between these two functions, if you turn on a speed hacking application (pick your poison), you should see a very large difference between the two result sets, sometimes even up to a couple of seconds. The difference is very noticable.
Write a simple application that returns the GetTickCount and TimeGetTime results, without the speed hacking application running, and leave it running. Compare the results and display the difference -- you should see a very small difference between the two. Then, with your application still running, turn on the speed hacking application and you will see the very large difference in the two result sets.
The trick is figuring out what threshold will constitue suspicious activity.
I'm working on web app (Rails 3 based). And I really don't like the time it takes to generate the page - depending on the displayed data it takes up to 2.5 and even 4 seconds.
So I just was wondering what is the average reasonable time for generating page in your apps. Saying you check the generation time, e.g. it's 750ms and think "Ok, that should be fine even without caching". Or when you see 1.5sec you think "Oh my God, the user won't wait so long and leave the site"
There's a huge amount of research data regarding the time from query to rendering and user's experience. I'd recommend reading this useit.com article. After all Google integrated page speed in its results for a reason ;)
The 3 response-time limits are the
same today as when I wrote about them
in 1993 (based on 40-year-old research
by human factors pioneers):
0.1 seconds gives the feeling of instantaneous response — that is, the
outcome feels like it was caused by
the user, not the computer. This level
of responsiveness is essential to
support the feeling of direct
manipulation (direct manipulation is
one of the key GUI techniques to
increase user engagement and control —
for more about it, see our Principles
of Interface Design seminar).
1 second keeps the user's flow of thought seamless. Users can sense a
delay, and thus know the computer is
generating the outcome, but they still
feel in control of the overall
experience and that they're moving
freely rather than waiting on the
computer. This degree of
responsiveness is needed for good
navigation.
10 seconds keeps the user's attention. From 1–10 seconds, users
definitely feel at the mercy of the
computer and wish it was faster, but
they can handle it. After 10 seconds,
they start thinking about other
things, making it harder to get their
brains back on track once the computer
finally does respond.
A 10-second delay will often make
users leave a site immediately. And
even if they stay, it's harder for
them to understand what's going on,
making it less likely that they'll
succeed in any difficult tasks.
As a rule of thumb, think that you always should aim for a balance of optimization time vs time gained. Don't spend days optimizing the hell out of one routine when your images aren't compressed correctly, or your scripts/css not combined. Yes, faster is better, but a 90% gain in generating the page by setting up a smart cache beats a 10% gain after one week tweaking the algorithm.
Also don't look too much into the first-render-time when the framework has to load everything, but use stress-testing, cached or not, to simulate various situations.
Now, some data; some of the latest sites i worked on used DotNetNuke, a huge open-source CMS, and Asp.Net MVC where you nearer to the metal. Average page time with average db queries was 600-700 milliseconds for DotNetNuke. For Asp.net MVC, it's 70-100 milliseconds... Users really like the second one :)
There's no 'right' answer to this - the faster the better. Personally I normally aim for < 200ms, although I know from experience that it can be quite difficult to achieve this in Rails on anything but simple apps. Try and figure out where your bottlenecks are and cache what you can.
Edit: There seems to be some confusion between page generation time and page render time. Obviously a quick page render is the goal, and on most sites doing things like reducing HTTP requests, gzipping CSS/JS are where you can get most of your quick wins. But if the page itself can take 4-5 seconds to generate, then you're probably right that your app is where you should start.
It depends on whether nothing is displayed for 2.5-4 seconds, or that the user already sees (a part of) the page from the start, and it finishes loading completely after 2.5-4 seconds. In that case the user doesn't experience a 2.5-4 second load. Take the http://www.nytimes.com/ website; I see most of it right away, but according to the Web Inspector it takes 1.94 seconds for it to be loaded completely.
And keep in mind that the speed will also depend on the browser, computer, internet connection. What's fast for you might be slower for others.
Measure your apdex score and see how it is performing. That will give you a rough indiciation. From there, you can decide how you want to increase performance.
It also depends on what your site is; an system application for a business or software as a service (SaaS)? If it's a system application, the users are forced to use it to performance can be negotiated. If it is a SaaS, then the higher your apdex score, the more chance you have of losing your user's interest.
There are a few gems out there that measure performance and report on what your apdex is.
Here's a little more info: http://apdex.org/blog/?p=630
My personal rule - no page should take more than 0.05 seconds, or you are in troubles.
As long as you write proper code, you don't need to spend much time on optimization to stay under 0.05.
If you stick to giant frameworks, then you are out of luck.