On page 42 of Code Code complete there's a checklist of requirement items you might want to consider during a requirement phase.
On of the items (near the bottom of the list) says Are minimum machine memory and free disk space specified
Has this ever been a requirement in any project you did and how did you define such a requirement before even starting to build stuff?
I know this is only a suggestion and frankly I don't think I will ever include that in my requirements, but it got me thinking (and this is the real question)..
How would one ever make an estimation of system requirements...
This is in the requirements phase so isn't it more about identifing the minimum specification of machine that the application has to run on than estimating the resources your application will use?
I've developed systems for corporate clients where they have standard builds and are able to identify the minimum spec machine that will be used. Often you won't know the minimum specifications of the machines that you will be installing on but you will know the operating systems that you have to support, and may be able to infer them from that.
I have specified this before but its always been a ballpark figure using the 'Standard' specification of the day. For example at the moment I would simply say that my App was designed to be deployed to servers with at least 4GB of RAM. Because that's what we develop and test on.
For client apps you might need to get a bit more detailed, but its generally best to decide on the class of machine you are targeting and then make sure that your app fits within those constraints. Only when your application has particularly high requirements in one area (eg if it stores a lot of images, or needs a powerful graphics processor) do you need to go into more detail.
These sure are considerations in the early stages of some projects I've worked on. A lot of scientific codes boil down to working with large matrices. It's often possible to identify early on that code X will need to manipulate a dense matrix with, say, 100,000 rows and columns of complex doubles. Do the sums. Sometimes the answer is (a) pack a PC with RAM, sometimes it is (b) we'll have to parallelise this for memory even if it's not necessary for performance.
Sometimes our users would like to checkpoint their programs every N iterations. Checkpointing with very large datasets can use a lot of disk space. Get out your calculator again.
I know it's all very niche, but it matters when it matters.
Machine memory is a tricky one with virtual memory being so common, but disk space isn't that hard depending on the system. We've got a system at work that was built to deal with a number of external devices (accepting input, transforming data and delivering to a customer) and that was fairly easy to size given that we knew the current and projected data volumes that the devices were generating.
You can check how much memory is used by your software during testing, and then estimate how much more you may need if you process bigger chunks, i.e. if you process 1000 items in your biggest test suite and you need 4 MB, then you will probably need 4 GB to process 1 million items.
I've seen software in embedded systems have minimum machine memory requirements - often derived from limitations on the custom built hardware. If the box can only be X by Y by Z dimensions, and has to have other physical requirements satisfied, the limitations on available memory for the software can be absolute and the bare minimum should get set up front.
It's never been a big deal for me in the web app world - after all, there will probably be a new model of the target hardware released before I'm done with the code and memory will be cheaper... so why waste time trying to fit a small size when you can just add on?
I've seen large data projects mention free space - you can really gunk up a system if your database doesn't have some amount of slack to move data around. I've seen requirements that specify bells and whistles and emergency measures to make sure that there is always enough room to keep the database humming.
Related
I'm am currently planning to setup a service that should be (sooner or later) globally available with high demands on availability and fault tolerance. There will be both a high read and hight write ratio and the system should be able to scale on demand.
A more special property of my planned service is, that the data will be extremely bound to a certain geo-location - e.g. in 99.99% of all cases, data meant for a city in the USA will never be queried from Europe (actually even data meant for a certain city will unlikely be queried from the city next to that city).
What I want to minimize is:
Administration overhead
Network latency
Unnecessary data replication (I don't want to have a full replication of the data meant for Europe in USA)
In terms of storage technologies I think that my best storage solution would be cassandra. The options that I see for my use-case are:
Use a completely isolated cassandra cluster per geo-location combined with a manually configured routing service that chooses the right cluster per insert/select query
Deploy a global cluster and define multiple data centers for certain geo-locations to ensure high availability in that regions
Deploy a global cluster without using data centers
Deploy a global cluster without using data centers and manipulate the partitioning to be geo-aware. My plan here is to manipulate the first 3 bits of the partition-key based on the geo-location (e.g. 000: North America, 001: South America, 010: Africa, 011: South/West Europe, etc.) and to assign the remaining bits by using a hash algorithm (similar to cassandras random partitioner).
The disadvantage of solution 1 would probably be a huge administrative overhead and a lot of manual work; the disadvantage of the second solution would be a huge amount of unnecessary data replication; and the disadvantage of the third solution would be a quite high network latency due to random partitioning across the world.
Therefore, in theory, I like solution 4 most. Here I would have a fair amount of administrative overhead, a low amount of unnecessary data replication and a decent availability. However, to implement this (as far as I know) I will need a ByteOrderPartitioning, which is highly disrecommended from many sources.
Is there a way to implement a solution close to solution 4 without using ByteOrderPartitioning, or is this a case where ByteOrderPartitioning could make sense or am I missing one obvious fifth solution?
Reconsider option 2.
Not only will it solve your problems. It will even solve geo-redundancy for you. As you mentioned you need to have high availability. Having one copy in a different datacenter sounds good in case that one of the datacenters dies.
If you are dead set on refraining from replication between DCs, then thats an option too. You can have multiple DCs over different regions without replicating between them.
Just wondering if anyone has any information on the status of project Rassilon, Neo4j's side project which focuses on improving horizontal scalability of Neo4j?
It was first announced in January 2013 here.
I'm particularly interested in knowing more about when the graph size limitation will be removed and when sharding across clusters will become available.
The node & relationship limits are going away in 2.1, which is the next release post 2.0 (which now has a release candidate).
Rassilon is definitely still in the mix. That said, that work is not taking precedence over things like the significant bundle of new features that are in 2.0. The reason is that Neo4j as it stands today is extremely capable of scaling, using the variety of architecture features outlined below (with some live examples):
www.neotechnology.com/neo4j-scales-for-the-enterprise/
There's lots of cleverness in the current architecture that allows the graph to perform & scale well without sharding. Because once you start sharding, you are destined to traverse over the network, which is a bad thing (for latency, query predictability etc.) So while there are some extremely large graphs that, largely for write throughput reasons, must trade off performance for uber scale (by sharding), the happy thing is that most graphs don't require this compromise. Sharding is required only in the 1% case, which means that nearly everyone can have their cake and eat it too. There are currently Neo4j clusters in production customers with 1B+ individuals in their graph, backing web applications with tens of millions of users. These use comparatively small (but very fast, very efficient) clusters. To give you some idea of the kinds of price-performance we regularly see: we've had users tell us that a single Neo4j instance could the same work as 10 Oracle instances, only faster.
A well-tuned Neo4j cluster can support upwards of 10K transactional writes per second, and an arbitrarily high number of reads per second. Read throughput scales linearly as instances are elastically plugged in. Cache sharding is a design pattern that ensures that you don't have to keep the entire graph in memory.
Seaside is known as "the heretical web framework". One of the points that make it heretical is that it has much shared state. That however is something which, in my current understanding, hinders easy scaling.
Ruby on rails on the other hand shares as less state as possible. It has been known to scale pretty well, even if it is dog slow compared to modern smalltalk vms. flickr uses php and has scaled to an extremly big infrastructure...
So has anybody some experience in the scaling of Seaside?
Ramon Leon shares some of his experience on upscaling seaside on his (excellent) blog. You can read very concrete ideas with sample code about configuring and tuning seaside.
Enjoy :-)
http://onsmalltalk.com/scaling-seaside-more-advanced-load-balancing-and-publishing
http://onsmalltalk.com/scaling-seaside-redux-enter-the-penguin
http://onsmalltalk.com/stateless-sitemap-in-seaside
Short answer:
you can scale Seaside applications like hell yah
Long answer:
In the IT domain, scaling is one thing but it has two dimensions:
horozontal
vertical
Almost everybody thought about scaling in the vertical dimension. That was until intel and friends reached some physical barriers and started to add cores to compensate the current impossibility of adding MHz.
That's when all we started to be more aware of scaling horizontally as the way to go.
Why am I telling you this?
Because Seaside is a smalltalk image running in a VM and that is roughly the same situation of a system in a server of a monocore processor.
Taking that as foundation, you scale web apps by making a cluster of servers. It's the natural thing to do, it's the fault tolerant thing to do, is the topologically intelligent thing to do, is the flexible thing to do, I guess you get the idea...
So, if for scaling, you do the same as intel & friends, you embrace the horizontal way. And it's even cheaper that the vertical way (that will lead you to IBM and Sun servers that are as good as expensive).
RoR applications are typically scaled horizontally. Google has countless cheap servers to do their thing. It works perfectly fine no matter how dramatized people want's to impress you throwing at you a bunch of forgettable twitter whales.
If they talk to you about that, you just be polite and hear what they say but remember this:
perfect is the enemy of the good
the unfinished perfect will never be as value as the good thing done
BTW, Amazon does something like that too (and it kind of couple geolocation so they enhance the chances of attending your requests with the cluster that is closest to your location).
On the other hand, the way Avi scaled dabbledb (Seaside based web application company bought by twitter) was using one vm per customer account (starting up and shutting down those on demand).
Having a lot of state in an image doesn't make scaling impossible nor complicated.
Just different.
The way to go is with a load balancer that uses sticky sessions so you can have one image attenting all the requests of an user session. You make things so any worker-image behind the load balancer can attend any user of a given app. And that's pretty much it.
To be able to do that you need to share the persistent objects among workers. All the users databases needs to be accessible by the workers at anytime and need to deal well with concurrency.
We designed airflowing scalable in that way.
It's economically convenient too because you can start with N very small (depending on the RAM of your first server) and increase it on demand until you reach the hardware limit.
Once you reach the hardware limit, you just add another host to the cluster and recofigure the balancer (and the access to the databases).
Simple, economic and elegant.
http://dabbledb.com/ seems to scale quite well. Moreover, one can use GemStone GLASS to run Seaside.
On this interview Avi Bryant the creator of Seaside and Co founder DabbleDB
Explains how they make it scale.
From what I understand:
each customer has it's own Squeak
Image.
When a customer comes Apache decides based on the user name which port to send it to.
Based on the port it starts the customer's Squeak Image.
That way it can grow to an infinite number of servers.
I think this solution works for them based on the specifics of their application each customer doesn't need to share info between them. So no need for o centralized DB.
Anyway it is better to watch the interview rather than my half-made summary.
Yes, Seaside scales down fantastically. A single developer can create and maintain complex applications for small groups very well.
[coming back to this after a few years]
This actually is much more important than scaling up. Computer speed still grows a lot, and 99% of all applications can now run on one machine. Speed of development, and especially maintenance now totally dominates TCO.
I would rephrase your question slightly to: does Seaside prevent/discourage you from creating applications that scale? I would say usually no. Seaside doesn't have a default way to store your data (just like php on its doesn't, though Seaside gives you a few extra options) and my impression is interacting with your data tends to be the biggest hurdle when it comes to scaling.
If you want to store your data in a monolithic SQL db, like with rails, you can do that. Or you can use an object database. Or you can use a separate object database for each user, or separate db for each project, or a separate db for each user and project. Or you can store everything in a series of flat files or you can just store your data as objects in your VM's memory.
And because of continuations you don't need re-fetch your data out of your datastore on every webpage call. As when you are using a desktop application you can pull data out of your datastore when your user begins interacting with it, set the appropriate variables, and then use those variables between webcalls until the user is done with the data at which point you can update the datastore. When you don't share state you have to hit the datastore on every single webcall.
Of course this doens't mean scaling is free, it just means you have a larger domain in which to search for scaling solutions.
All that said, for many applications rails will scale much easier simply because there exist large hosting solutions for rails (and php for that matter) that will offer you a huge amount of resources without having to rent and setup a custom box.
These are just my impressions from reading and talking to people.
I just reminded that there is link on Pharo's success stories : a Seaside Web application with up to 1000 concurrent users for a large health insurance in Argentina .
Pharo success stories
Issys Tracking:
Load balancing: Apache as a proxy/balancer (round robin with session
affinity)
Server setup: 5 Pharo images on 3 different servers (Linux and Windows 2003)
GUI: Heavily AJAX-based. All code written in Smalltak: Seaside 3.0, Seaside JQuery integration and JQWidgetBox.
Persistency: Glorp (OR mapper) and OpenDBX (DB client)
Databases: large PostgreSQL and MS SQL Server DBs
From the Wikipedia article, it's a total pig. Prior to that, it hadn't scaled to the point where I'd heard of it. :)
Most of the literature on Virtual Memory point out that the as a Application developer,understanding Virtual Memory can help me in harnessing its powerful capabilities. I have been involved in developing applications on Linux for sometime but but didn't care about Virtual Memory intricacies while I code. Am I missing something? If so, please shed some light on how I can leverage the workings of Virtual Memory. Else let me know if am I not making sense with the question!
Well, the concept is pretty simple actually. I won't repeat it here, but you should pick up any book on OS design and it will be explained there. I recommend the "Operating System Concepts" from Silberscahtz and Galvin - it's what I had to use in the University and it's good.
A couple of things that I can think of what Virtual Memory knowledge might give you are:
Learning to allocate memory on page boundaries to avoid waste (applies only to virtual memory, not the usual heap/stack memory);
Lock some pages in RAM so they don't get swapped to HDD;
Guardian pages;
Reserving some address range and committing actual memory later;
Perhaps using the NX (non-executable) bit to increase security, but im not sure on this one.
PAE for accessing >4GB on a 32-bit system.
Still, all of these things would have uses only in quite specific scenarios. Indeed, 99% of applications need not concern themselves about this.
Added: That said, it's definately good to know all these things, so that you can identify such scenarios when they arise. Just beware - with power comes responsibility.
It's a bit of a vague question.
The way you can use virtual memory, is chiefly through the use of memory-mapped files. See the mmap() man page for more details.
Although, you are probably using it implicitly anyway, as any dynamic library is implemented as a mapped file, and many database libraries use them too.
The interface to use mapped files from higher level languages is often quite inconvenient, which makes them less useful.
The chief benefits of using mapped files are:
No system call overhead when accessing parts of the file (this actually might be a disadvantage, as a page fault probably has as much overhead anyway, if it happens)
No need to copy data from OS buffers to application buffers - this can improve performance
Ability to share memory between processes.
Some drawbacks are:
32-bit machines can run out of address space easily
Tricky to handle file extending correctly
No easy way to see how many / which pages are currently resident (there may be some ways however)
Not good for real-time applications, as a page fault may cause an IO request, which blocks the thread (the file can be locked in memory however, but only if there is enough).
May be 9 out of 10 cases you need not worry about virtual memory management. That's the job of the kernel. May be in some highly specialized applications do you need to tweak around them.
I know of one article that talks about computer memory management with an emphasis on Linux [ http://lwn.net/Articles/250967 ]. Hope this helps.
For most applications today, the programmer can remain unaware of the workings of computer memory without any harm. But sometimes -- for example the case when you want to improve the footprint of your program -- you do end up having to manipulate memory yourself. In such situations, knowing how memory is designed to work is essential.
In other words, although you can indeed survive without it, learning about virtual memory will only make you a better programmer.
And I would think the Wikipedia article can be a good start.
If you are concerned with performance -- understanding memory hierarchy is important.
For small data sets which are fully contained in physical memory you need to be concerned with caching (accessing memory from the cache is much faster).
When dealing with large data sets -- which may be paged out due to lack of physical memory you need to be careful to keep your access patterns localized.
For example if you declare a matrix in C (int a[rows][cols]), it is allocated by rows. Thus when scanning the matrix, you need to scan by rows rather than by columns. Otherwise you will be paging the same data in and out many times.
Another issue is the difference between dirty and clean data held in memory. Clean data is information loaded from file that was not modified by the program. The OS may page out clean data (perhaps depending on how it was loaded) without writing it to disk. Dirty pages must first be written to the swap file.
Brief description of requirements
(Lots of good answers here, thanks to all, I'll update if I ever get this flying).
A detector runs along a track, measuring several different physical parameters in real-time (determinist), as a function of curvilinear distance. The user can click on a button to 'mark' waypoints during this process, then uses the GUI to enter the details for each waypoint (in human-time, but while the data acquisition continues).
Following this, the system performs a series of calculations/filters/modifications on the acquired data, taking into account the constraints entered for each waypoint. The output of this process is a series of corrections, also as a function of curvilinear distance.
The third part of the process involves running along the track again, but this time writing the corrections to a physical system which corrects the track (still as a function of curvilinear distance).
My current idea for your input/comments/warnings
What I want to determine is if I can do this with a PC + FPGA. The FPGA would do the 'data acquisition', I would use C# on the PC to read the data from a buffer. The waypoint information could be entered via a WPF/Winforms application, and stocked in a database/flatfile/anything pending 'processing'.
For the processing, I would use F#.
The the FPGA would be used for 'writing' the information back to the physical machine.
The one problem that I can foresee currently is if processing algorithms require a sampling frequency which makes the quantity of data to buffer too big. This would imply offloading some of the processing to the FPGA - at least the bits that don't require user input. Unfortunately, the only pre-processing algorithm is a Kalman filter, which is difficult to implement with an FPGA, from what I have googled.
I'd be very greatful for any feedback you care to give.
UPDATES (extra info added here as and when)
At the entrance to the Kalman filter we're looking at once every 1ms. But on the other side of the Kalman filter, we would be sampling every 1m, which at the speeds we're talking about would be about 2 a second.
So I guess more precise questions would be:
implementing a Kalman filter on an FPGA - seems that it's possible, but I don't understand enough about either subject to be able to work out just HOW possible it is.
I'm also not sure whether an FPGA implementation of a Kalman will be able to cycle every 1ms - though I imagine that it should be no problem.
If I've understood correctly, FPGAs don't have hod-loads of memory. For the third part of the process, where I would be sending a (approximately) 4 x 400 array of doubles to use as a lookup table, is this feasible?
Also, would swapping between the two processes (reading/writing data) imply re-programming the FPGA each time, or could it be instructed to switch between the two? (Maybe possible just to run both in parallel and ignore one or the other).
Another option I've seen is compiling F# to VHDL using Avalda FPGA Developer, I'll be trying that soon, I think.
You don't mention your goals, customers, budget, reliability or deadlines, so this is hard to answer, but...
Forget the FPGA. Simplify your design, development environment and interfaces unless you know you are going to blow your real-time requirements with another solution.
If you have the budget, I'd first take look at LabView.
http://www.ni.com/labview/
http://www.ni.com/dataacquisition/
LabView would give you the data acquisition system and user GUI all on a single PC. In my experience, developers don't choose LabView because it doesn't feel like a 'real' programming environment, but I'd definitely recommend it for the problem you described.
If you are determined to use compiled languages, then I'd isolate the real time data acquisition component to an embedded target with an RTOS, and preferably one that takes advantage of the MMU for scheduling and thread isolation and lets you write in C. If you get a real RTOS, you should be able to realiably schedule the processes that need to run, and also be able to debug them if need be! Keep this off-target system as simple as possible with defined interfaces. Make it do just enough to get the data you need.
I'd then implement the interfaces back to the PC GUI using a common interface file for maintenance. Use standard interfaces for data transfer to the PC, something like USB2 or Ethernet. The FTDI chips are great for this stuff.
Since you are moving along a track, I have to assume the sampling frequency isn't more than 10 kHz. You can offload the data to PC at that rate easily, even 12 Mb USB (full-speed).
For serious processing of math data, Matlab is the way to go. But since I haven't heard of F#, I can't comment.
4 x 400 doubles is no problem. Even low-end FPGAs have 100's of kb of memory.
You don't have to change images to swap between reading and writing. That is done all the time in FPGAs.
Here is a suggestion.
Dump the FPGA concept.
Get a DSP evaluation board from TI
Pick one with enough gigaflops to make you happy.
Enough RAM to store your working set.
Program it in C. TI supply a small RT kernel.
It talks to the PC over, say a serial port or ethernet, whatever.
It sends the PC cooked data with a handshake so the data doesn't get lost.
There is enough ram in the DPS to store your data while the PC has senior moments.
No performance problems with the DSP.
Realtime bit does the realtime, with MP's of ram.
Processing is fast, and the GUI is not time-critical.
What is your connection to the PC? .Net will be a good fit if it is a network based connection, as you can use streams to deal with the data input.
My only warning to you regarding F# or any functional programming language involving large data sets is memory usage. They are wonderful and mathematically provable but when you are getting a stack overflow exception from to many recursions it means that your program won't work and you lose time and effort.
C# will be great if you need to develop a GUI, winforms and GDI+ should get you to something usable without a monumental effort.
Give us some more information regarding data rates and connection and maybe we can offer some more help?
There might be something useful in the Microsoft Robotics Studio: link text especially for the real time aspect. The CCR - Concurrency Coordination Runtime has a lot of this thought out already and the simulation tools might help you build a model that would help your analysis.
Sounds to me like you can do all the processing off line. If this is the case, then offline is the way to go. In other words divide the process into 3 steps:
Data acquisition
Data analysis
Physical system corrections based on the data analysis.
Data Acquisition
If you can't collect the data using a standard interface, then you probably have to go with a custom interface. Hard to say if you should be using an FPGA without knowing more about your interface. Building custom interfaces is expensive, so you should do a tradeoff study to select the approach. Anyway, if this is FPGA based then keep the FPGA simple and use it for raw data acquisition. With current hard drive technology you can easily store 100's of Gigabytes of data for post-processing, so store the raw data on a disk drive. There's no way you want to be implementing even a 1 dimensional Kalman filter in an FPGA if you don't have to.
Data Analysis
Once you've got the data on a hard drive, then you have lots of options for data analysis. If you already know F#, then go with F#. Python and Matlab both have lots of data analysis libraries available.
This approach also makes it much easier to test your data analysis software than a solution where you have to do all the processing in real time. If the results don't seem right, you can easily rerun the analysis without having to go and collect the data again.
Physical System Corrections
Take the results of the data analysis and run the detector along the track again feeding it the appropriate inputs through the interface card.
I've done a lot of embedded engineering including hybrid systems such as the one you've described. At the data rates and sizes you need to process, I doubt that you need an FPGA ... simply find an off the shelf data acquisition system to plug into your PC.
I think the biggest issue you're going to run into is more related to language bindings for your hardware APIs. In the past, I've had to develop a lot of my software in C and assembly (and even some Forth) simply because that was the easiest way to get the data from the hardware.