How is Apache Thrift scalable?

How is Apache Thrift scalable? - scalability

In their website Apache Thrift is introduced as
software framework, for scalable cross-language services
development...
but I couldn't find what makes it scalable. So my question is what makes it scalable? Does just using Thrift make your application scalable? If not, how do I use thrift in a scalable way?

"Scalability", in this context, means the ability to partition the application in as many or few pieces, using as few or as as many different processors, as necessary. The same app can be "built out" simply by adding hardware.
From the Thrift white paper:
https://thrift.apache.org/static/files/thrift-20070401.pdf
Thrift has enabled Facebook to build scalable backend services
efficiently by enabling engineers to divide and conquer. Application
developers can focus on application code without worrying about the
sockets layer. We avoid duplicated work by writing buffering and I/O
logic in one place, rather than interspersing it in each application.

Related

Is SUAVE production ready for web application development with millions of user traffic?

We are a startup and currently in the evaluation mode for using SUAVE with F# as the web application development framework. I am very enthusiastic for using the SUAVE framework for developing my applications.
I just want to know if SUAVE is production ready and if any performance benchmarking has been done on it as compared to OWIN for concurrent users and how many user traffic can the web server handle.

Altough this thread now 8 months old, I wanted to share my experience with using Suave as web server.
First, measuring performance based on simple benchmarks won't tell you the truth about the overall performance of a more complicated system.
However, when using Suave, it's unlikely that it will be the bottleneck in your application.
It depends a lot more on the entire architecture, the sum of mechanics between request and response, and implementation details (e.g. random access on Lists is rather slow).
I used Suave in 3 projects now, always with great success.
All of them heavily used paralellization and multi-threading.
Two of them where simply run directly by Suave behind an Nginx-Proxy, one used IIS.
Running under IIS did not have any measurable influence on the performance.
When I came across any performance issues, Suave was never the place too look for them.
When utilizing the awesome concurrency and parallelization features of F#, your application will benefit from vertical scaling.
For example, I built an image processing service which performed rather bad on AWS, but great on a notebook with a quad core Pentium processor.
But again, this has nothing to do with Suave.
Actually it pretty much goes out of your way.
Suave itself is a great, and solid choice. In about 2 years, I did not run into edge cases, where Suave would be the cause of trouble.
I have to mention, that my expeciences are based on simple web servers and services.
Suave was used for a fairly flat web layer to serve RPC or REST-APIs.
Other tasks, like streaming or soft-realtime applications maybe would require another approach, and might not be suited well for Suave.

Neo4j Standalone vs Embedded server?

I want to know what is the difference between these two implementations of neo4j. Of-course names of both techniques is self-explanatory,but still what are the main differences?
What factors should be considered in deciding which technique to use in the project?
Pros and cons.
P.S. Sorry if it is a repeat question but I searched and was not able to find any ques which answers my question.

Because the standalone server is built on the embedded server, the general rule of thumb is that the embedded server is more capable and has (obviously) lower latency. Either can operate in High-Availability mode, allow monitoring, and even accept connections from the neo4j-shell. With the server though, you get more functionality out-of-the-box, like remoting, basic visualization, monitoring interface, etc.
The differences are otherwise the practical ones you'd imagine. Choosing a deployment approach is influenced by two things:
Language - embedded mode requires that you're implementing your application with a JVM compatible language. The server supports any language/framework that can send HTTP requests.
Hardware - sharing physical resources between your application and Neo4j can be demanding. Scaling may argue for a dedicated machine to split out the persistence layer. The server obviously has a remote API to support segmenting your application.
It's otherwise difficult to give guidance without a specific usage scenario. Deploying into an existing Service Oriented Architecture? Probably server. Running on an copier machine? Go embedded. From scratch web application? What's the rest of your stack?

Integrating a Thrift ruby server and a Ruby on Rails web app

In my free time, I'm currently working on a web app written with Rails, and planning on writing "thick" clients for the desktop and various mobile platforms (who doesn't ?).
I like the concept of Thrift for its multi-language support, and the concept of having one IDL file generating appropriate code for clients (DRY !)
I was wondering what would be the best way / architecture to integrate the Thrift server and Rails.
The only options that come to mind seems sup-optimal :
call the wepapp APIs from the Thrift server to return data to the thick clients
plug the thrift server to the DB of the Rails app and do its thing.
For obvious reasons, this seems overkill, redundant and not flexible.
Any suggestion ?
thanks !

I'm not sure if its overkill :) But I suggest if you want to explore this topic even more that you also look into this thread.

Where is Erlang used and why? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to know a list of the most common application/websites/solutions where Erlang is used, successfully or not.
Explaining why it is used into a specific solution instead of others programming languages would be very much appreciated, too.
Listing BAD Erlang case studies (cases in which Erlang is misused) it would be interesting, as well.

From Programming Erlang:
(source: google.com)
Many companies are using Erlang in their production systems:
• Amazon uses Erlang to implement SimpleDB, providing database services as a part
of the Amazon Elastic Compute Cloud (EC2).
• Yahoo! uses it in its social bookmarking service, Delicious, which has more than
5 million users and 150 million bookmarked URLs.
• Facebook uses Erlang to power the backend of its chat service, handling more than
100 million active users.
• WhatsApp uses Erlang to run messaging servers, achieving up to 2 million connected users per server.
• T-Mobile uses Erlang in its SMS and authentication systems.
• Motorola is using Erlang in call processing products in the public-safety industry.
• Ericsson uses Erlang in its support nodes, used in GPRS and 3G mobile networks
worldwide.
The most popular open source Erlang applications include the following:
• The 3D subdivision modeler Wings 3D, used to model and texture polygon
meshes.
• The Ejabberd system, which provides an Extensible Messaging and Presence Protocol
(XMPP) based instant messaging (IM) application server.
• The CouchDB “schema-less” document-oriented database, providing scalability
across multicore and multiserver clusters.
• The MochiWeb library that provides support for building lightweight HTTP servers.
It is used to power services such as MochiBot and MochiAds, which serve
dynamically generated content to millions of viewers daily.
• RabbitMQ, an AMQP messaging protocol implementation. AMQP is an emerging
standard for high-performance enterprise messaging.

ejabberd is one of the most well know erlang application and the one I learnt erlang with.
I think it's the one of most interesting project for learning erlang because it is really building on erlang's strength. (However some will argue that it's not OTP, but don't worry there's still a trove of great code inside...)
Why ?
An XMPP server (like ejabberd) can be seen as a high level router, routing messages between end users. Of course there are other features, but this is the most important aspect of an instant messaging server. It has to route many messages simultaneously, and handle many a lot of TCP/IP connections.
So we have 2 features:
handle many connections
route messages given some aspects of the message
These are examples where erlang shines.
handle many connections
It is very easy to build scalable non-blocking TCP/IP servers with erlang. In fact, it was designed to solve this problem.
And given it can spawn hundreds of thousand of processes (and not threads, it's a share-nothing approach, which is simpler to design), ejabberd is designed as a set of erlang processes (which can be distributed over several servers) :
client connection process
router process
chatroom process
server to server processes
All of them exchanging messages.
route messages given some aspects of the message
Another very lovable feature of erlang is pattern matching.
It is used throughout the language.
For instance, in the following :
access(moderator, _Config)-> rw;
access(participant, _Config)-> rw;
access(visitor, #config{type="public"})-> r;
access(visitor, #config{type="public_rw"})-> rw;
access(_User,_Config)-> none.
That's 5 different versions of the access function.
Erlang will select the most appropriate version given the arguments received. (Config is a structure of type #config which has a type attribute).
That means it is very easy and much clearer than chaining if/else or switch/case to make business rules.
To wrap up
Writing scalable servers, that's the whole point of erlang. Everything is designed it making this easy. On the two previous features, I'd add :
hot code upgrade
mnesia, distributed relational database (included in the base distribution)
mochiweb, on which most http erlang servers are built on
binary support (decoding and encoding binary protocol easy as ever)
a great community with great open source projects (ejabberd, couchdb but also webmachine, riak and a slew of library very easy to embed)
Fewer LOCs
There is also this article from Richard Jones. He rewrote an application from C++ to erlang: 75% fewer lines in erlang.

The list of most common applications for Erlang as been covered (CouchDb, ejabberd, RabbitMQ etc) but I would like to contribute the following.
The reason why it is used in these applications comes from the core strength of Erlang: managing application availability.
Erlang was built from ground up for the telco environment requiring that systems meet at least 5x9's availability (99.999% yearly up-time). This figure doesn't leave much room for downtime during a year! For this reason primarily, Erlang comes loaded with the following features (non-exhaustive):
Horizontal scalability (ability to distribute jobs across machine boundaries easily through seamless intra & inter machine communications). The built-in database (Mnesia) is also distributed by nature.
Vertical scalability (ability to distribute jobs across processing resources on the same machine): SMP is handled natively.
Code Hot-Swapping: the ability to update/upgrade code live during operations
Asynchronous: the real world is async so Erlang was built to account for this basic nature. One feature that contributes to this requirement: Erlang's "free" processes (>32000 can run concurrently).
Supervision: many different strategies for process supervision with restart strategies, thresholds etc. Helps recover from corner-cases/overloading more easily whilst still maintaining traces of the problems for later trouble-shooting, post-mortem analysis etc.
Resource Management: scheduling strategies, resource monitoring etc. Note that the default process scheduler operates with O(1) scaling.
Live debugging: the ability to "log" into live nodes at will helps trouble-shooting activities. Debugging can be undertaken live with full access to any process' running state. Also the built-in error reporting tools are very useful (but sometimes somewhat awkward to use).
Of course I could talk about its functional roots but this aspect is somewhat orthogonal to the main goal (high availability). The main component of the functional nature which contributes generously to the target goal is, IMO: "share nothing". This characteristic helps contain "side effects" and reduce the need for costly synchronization mechanisms.
I guess all these characteristics help extending a case for using Erlang in business critical applications.
One thing Erlang isn't really good at: processing big blocks of data.

We built a betting exchange (aka prediction market) using Erlang. We chose Erlang over some of the more traditional financial languages (C++, Java etc) because of the built-in concurrency. Markets function very similarly to telephony exchanges. Our CTO gave a talk on our use of Erlang at CTO talk.
We also use CouchDB and RabbitMQ as part of our stack.

Erlang comes from Ericsson, and is used within some of their telecoms systems.
Outside telecoms, CouchDb (a document-oriented database) is possibly the best known Erlang application so far.
Why Erlang ? From the overview (worth reading in full):
The document, view, security and
replication models, the special
purpose query language, the efficient
and robust disk layout and the
concurrent and reliable nature of the
Erlang platform are all carefully
integrated for a reliable and
efficient system.

I came across this is in the process of writing up a report: Erlang in Acoustic Ray Tracing.
It's an experience report on a research group's attempt to use Erlang for Acoustic Ray Tracing. They found that while it was easier to write the program, less buggy, etc. It scaled worse, and performed 10x slower than a comparable C program. So one spot where it may not be well suited is CPU intensive scenarios.
Do note though, that the people wrote the paper were in the stages of first learning Erlang, and may not have known the proper development procedures for CPU intensive Erlang.

Apparently, Yahoo used Erlang to make something it calls Harvester. Article about it here: http://www.ddj.com/architect/220600332

What is erlang good for?
http://beebole.com/en/blog/erlang/why-erlang/
http://www.aquabu.com/2008/2/15/erlang-pragmatic-studio-day-3-notes
http://www.reddit.com/r/programming/comments/9q0lr/erlang_and_highfrequency_trading/
(jerf's answer)
It's important to realize that Erlang's 4 parts: the language itself, the VMs(BEAM, hipe) standard libs (plus modules on github, CEAN, etc.) and development environment are being steadily updated / expanded/improved. For example, i remember reading that the floating point performance improved when Wings3d's author realized it needed to improve (I can't find a source for this). And this guy just wrote about it:
http://marian-dan.com/wordpress/?p=324
A couple years ago, Tim Bray's Wide Finder publicity and all the folks starting to do web app frameworks and HTTP servers lead (at least in part) to improved regex and binaries handling. And there's all the work integrating HiPE and SMP, the dialyzer project, multiple unit testing and build libs springing up, ..
So its sweet spot is expanding, The difficult thing is that the official docs can't keep up very well, and the mailing list and erlang blogosphere volume are growing quickly

We are using Erlang to provide the back-end muscle power for our really real-time browser-based multi-player game Pixza. We don't use Flash or any other third-party plugins, though the game is real-time multi-player. We use pure JS and COMET techniques instead. And Erlang supports the "really realtimeliness" of Pixza.

I'm working for wooga, a social game company and we use Erlang for some of our game backends (basically http apis for millions of daily users) and auxiliary services like ios push notification provider, payment etc.
I think it really shines in network related tasks and it makes it kind of straight forward to structure and implement simple and complex network services alike in it. Distribution, fault tolerance and performance are easy to achieve because Erlang already has some of the key ingredients built in and they are being used for a long time in critical production infrastructure. So its not like "the new hip technology thing 0.0.2 alpha".
I know that other game companies use Erlang as well. You should be able to find presentations on slideshare about that.

Erlang draws its strength from being a functional language with no shared memory. Hence IMO, Erlang won't be suitable for applications that require in place memory manipulations. Image editing for example.

Comet app over REST in erlang?

I am a newbie to Erlang and am trying to make a switch to Erlang for our latest project. Since this is going to be a real-time chat (long polled) system for file sharing on the fly, I realized after a bit of digging around that Erlang would be the most appropriate choice, because of high concurrency, plus people also suggested to use Yaws since it can handle upto 50k parallel connections.
Although, it sounds awesome, but since I am a newbie (both to erlang and comet applications), I am unable to understand the right technology stack / architecture of how this would work. Also, because of relatively less documentation, I am unable to figure out how the individual pieces would fit together (web server, application layer, DB, message queue) for such an application. The application is going to run off a desktop client only (no web presence required), and so we need to build a REST api for the functionalities.
It would be great, if someone could point me in the right direction to proceed.
Thanks

Nitrogen has a very slick Comet feature built-in. It will work with the three most popular Erlang web servers, including the one you're already considering, YAWS.
Nitrogen doesn't do anything in particular about data storage. It's not one of those web frameworks that insists on managing the DB for you. You're free to use Mnesia or whatever else you like. If this bothers you, you might consider Erlyweb instead. It doesn't do Comet for you like Nitrogen does, but it's more of the manage-everything-for-me sort of web framework.

You could use:
ejabberd as the XMPP server
mnesia as the database
YAWS as the WEB server
Message Queue : you can implement that in Erlang or use an enterprise solution such as RabbitMQ

The all new Zotonic application may inspire you. It's a webapp running off mochiweb for HTTP service with webmachine for the REST API. And it's using good ol' PostgreSQL as database.
It has comet support implemented.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart