Apache Flink vs Twitter Heron? - twitter

There are a lot of questions comparing Flink vs Spark Streaming, Flink vs Storm and Storm vs Heron.
The origin of this question is from the fact that both Apache Flink and Twitter Heron are true stream processing frameworks (not micro-batch, like Spark Streaming). Storm has been decommissioned by Twitter last year and they're using Heron instead (which is basically Storm reworked).
There are nice presentations by Slim Baltagi on Flink and Flink vs Spark:
https://www.youtube.com/watch?v=G77m6Ou_kFA
Nice research by Ilya Ganelin on various streaming frameworks:
https://www.youtube.com/watch?v=KkjhyBLupvs
Pretty interesting thoughts on Flink vs Storm:
What is/are the main difference(s) between Flink and Storm?
But I haven't seen any comparison of new Storm/Heron vs Apache Flink.
Both of the projects are pretty young, both support using previously written Storm applications and many other things. Flink is more fitting into Hadoop ecosystem, Heron is more into Twitter based ecosystem stack.
Any thoughts?

All of the points in the referenced article comparing Apache Flink and Apache Storm also apply to Twitter's Heron. Heron provides exactly the same type of semantics and functionality as Storm. Heron is really best understood simply as a re-implementation of Storm that better fits Twitter's operational requirements.

Heron, Stream processing engine developed by twitter and donated to Apache on 26th FEB 2018.
As per Twitter, the throughput is 10–14x higher than that of Storm in all experiments, Similarly latency is 5-15x lower than Storm’s latency.
Other then throughput and latency it provides
Easy debugging(Every task runs in process-level isolation).
Handling spikes and congestion(using backpressure mechanism).
Fully backward compatible with Storm which means only pom file changes required.
https://blog.twitter.com/engineering/en_us/a/2015/flying-faster-with-twitter-heron.html
https://apache.github.io/incubator-heron/

Related

What are keys differences between OpenTracing and Zipkin?

I am looking into distribution tracing tools.
Found there two very popular.
OpenTracing - https://opentracing.io/
Zipkin - https://zipkin.io/
What are key differences between them ?
Which one would you recommend ?
Will you recommend other open source distributed tracking tool ?
Getting a handle on the distributed tracing space can be a bit confusing. Here's a quick summary...
Open Source Tracers
There are a number of popular open source tracers, which is where Zipkin sits:
Zipkin
Jaeger
Haystack
Commercial Tracers
There are also a lot of vendors offering commercial monitoring/observability tools which are either centred around or include distributed tracing:
Appdynamics
AWS X-Ray
Azure Application Insights
Datadog
Dynatrace
Google Cloud Trace
Honeycomb
Lightstep
New Relic
SignalFX
(probably 100 more...)
Standardisation Efforts
Alongside all these products are numerous attempts at creating standards around distributed tracing. These typically start by creating a standard API for the trace-recording side of the architecture, and sometimes extend to become prescriptive about the content of traces or even the wire format. This is where OpenTracing fits in. So it is not a tracing solution itself, but an API that can be implemented by the trace recording SDKs of multiple tracers, allowing you to swap between vendors more easily. The most common standards are:
OpenTracing
OpenCensus
OpenTelemetry
Note that the first two in the list have been abandoned, with their contributors joining forces to create the third one together.[1]
[1] https://opensource.googleblog.com/2019/05/opentelemetry-merger-of-opencensus-and.html

Can AppDynamics work with a Prometheus backend?

Most popular logging and monitoring stacks like ELK stack or Time series DB-Grafana are designed to be integrated. Can AppDynamics work with other samplers/DBs, in particular Prometheus?
There are integration tools available between influxdb/AppDynamics and grafana/AppDynamics.
https://github.com/Appdynamics/MetricMover
https://grafana.com/plugins/dlopes7-appdynamics-datasource/installation).
There's nothing that integrates between Prometheus and AppDynamics at the moment
I'm not sure there will be one going forward, seeing how they are competing in the same space from different vantage points (Open Source vs Enterprise)

Apache Flume vs Apache Flink difference

I need to read a stream of data from some source (in my case it's UDP stream, but it shouldn't matter), transform the each record and write it to the HDFS.
Is there any difference between using Flume or Flink for this purpose?
I know I can use Flume with the custom interceptor to transform each event.
But I am new in Flink, so for me it looks like Flink will do the same.
Which one is better to choose? Is there a difference in performance?
Please, help!
Disclaimer: I'm a committer and PMC member of Apache Flink. I do not have detailed knowledge about Apache Flume.
Moving streaming data from various sources into HDFS is one of the primary use cases for Apache Flume as far as I can tell. It is a specialized tool and I would assume it has a lot of related functionality built in. I cannot comment on Flume's performance.
Apache Flink is a platform for data stream processing and more generic and feature rich than Flume (e.g., support for event-time, advance windowing, high-level APIs, fault-tolerant and stateful applications, ...). You can implement and execute many different kinds of stream processing applications with Flink including streaming analytics and CEP.
Flink features a rolling file sink to write data streams to HDFS files and allows to implement all kinds of custom behavior via user-defined functions. However, it is not a specialized tool for data ingestion into HDFS. Do not expect a lot of built-in functionality for this use case. Flink provides very good throughput and low latency.
If you do not need more than simple record-level transformations, I'd first try to solve your use case with Flume. I would expect Flume to come with a few features that you would need to implement yourself when choosing Flink. If you expect to do more advanced stream processing in the future, Flink is definitely worth a look.
Disclaimer: I'm a committer of Apache Flume. I do not have detailed knowledge about Apache Flink.
For the use case you have described, Flume could be the right choice.
You could use the Exec Source until netcat UDP source gets committed to the codebase.
For the transformation, it's hard to provide suggestions, but you might want to take a look at Morphline Interceptor.
Regarding the channel, I would recommend Memory Channel, because if the source is UDP, some negligible data loss should be acceptable.
Sink-wise, HDFS Sink probably covers your needs.

Examples of production Erlang deployments

I am currently learning Erlang
Can SO users give interesting examples of any of their Erlang application deployments?
I want to gain some insight into common Erlang uses past telecomms, and any problems or unexpected benefits Erlang brought during development/deployment.
I hope this will give some broader context and whet the whistle for myself and anyone else jumping into Erlang!
Thanks in advance!
Who uses Erlang for product development:
Bluetail/Alteon/Nortel (distributed,
fault tolerant email system, SSL
accelerator)
Cellpoint (Location-based Mobile
Services)
Corelatus (SS7 monitoring).
dqdp.net (in Latvian) (Web Services).
Facebook (Facebook chat backend)
Finnish Meteorological Institute
(Data acquisition and real-time
monitoring)
IDT corp. (Real-time least-cost
routing expert systems)
Klarna (Electronic payment systems)
Mobilearts (GSM and UMTS services)
Netkit Solutions (Network Equipment
Monitoring and Operations Support
Systems)
Process-one (Jabber Messaging)
Schlund + Partner (Messaging and
Interactive Voice Response services)
Quviq (Software Test Tool)
RabbitMQ (AMQP Enterprise Messaging)
T-Mobile (previously one2one)
(advanced call control services)
Telia (a telecomms operator)
Vail Systems (Interactive Voice
Response systems)
Wavenet (SS7 and IVR applications)
Our first application was a web/sms social network and I wrote a long paper on the subject which can be read here.
We've built a web app based on an Erlang backend.
Erlang is in charge of the business logic, the security and data store.
The browser communicates exclusively through JSON services with it and do the rendering.
It will be in beta soon, and to give you an idea of the app there is a video here
There are as well some resources here and here about what we learned along the way.
Get to know the the release tools erlang/OTP already provides.
Erlang bootscripts are wonderful for ensuring that all the running applications needed are present and of the correct version. Working within the OTP framework for releases will be much easier than trying to invent your own. Erlang has lots of tools for making sure deployments can be done both live and without breaking running services. The language and runtime are designed for this so they've done a lot of the heavy lifting for you. I've found the tools useful even for small "non-enterprise" apps and deployments.
Of course there's always applications like wings3D which is for 3D modelling. It's not exactly a "deployment", because these sorts of programs are used anywhere from individuals to teams of artists in their pipeline. There are other projects for things like simulation, but I'm not sure how many companies are publicly stating that they use Erlang. As for me, I'm planning to adopt it for my company for industrial automation.

Where is Erlang used and why? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to know a list of the most common application/websites/solutions where Erlang is used, successfully or not.
Explaining why it is used into a specific solution instead of others programming languages would be very much appreciated, too.
Listing BAD Erlang case studies (cases in which Erlang is misused) it would be interesting, as well.
From Programming Erlang:
(source: google.com)
Many companies are using Erlang in their production systems:
• Amazon uses Erlang to implement SimpleDB, providing database services as a part
of the Amazon Elastic Compute Cloud (EC2).
• Yahoo! uses it in its social bookmarking service, Delicious, which has more than
5 million users and 150 million bookmarked URLs.
• Facebook uses Erlang to power the backend of its chat service, handling more than
100 million active users.
• WhatsApp uses Erlang to run messaging servers, achieving up to 2 million connected users per server.
• T-Mobile uses Erlang in its SMS and authentication systems.
• Motorola is using Erlang in call processing products in the public-safety industry.
• Ericsson uses Erlang in its support nodes, used in GPRS and 3G mobile networks
worldwide.
The most popular open source Erlang applications include the following:
• The 3D subdivision modeler Wings 3D, used to model and texture polygon
meshes.
• The Ejabberd system, which provides an Extensible Messaging and Presence Protocol
(XMPP) based instant messaging (IM) application server.
• The CouchDB “schema-less” document-oriented database, providing scalability
across multicore and multiserver clusters.
• The MochiWeb library that provides support for building lightweight HTTP servers.
It is used to power services such as MochiBot and MochiAds, which serve
dynamically generated content to millions of viewers daily.
• RabbitMQ, an AMQP messaging protocol implementation. AMQP is an emerging
standard for high-performance enterprise messaging.
ejabberd is one of the most well know erlang application and the one I learnt erlang with.
I think it's the one of most interesting project for learning erlang because it is really building on erlang's strength. (However some will argue that it's not OTP, but don't worry there's still a trove of great code inside...)
Why ?
An XMPP server (like ejabberd) can be seen as a high level router, routing messages between end users. Of course there are other features, but this is the most important aspect of an instant messaging server. It has to route many messages simultaneously, and handle many a lot of TCP/IP connections.
So we have 2 features:
handle many connections
route messages given some aspects of the message
These are examples where erlang shines.
handle many connections
It is very easy to build scalable non-blocking TCP/IP servers with erlang. In fact, it was designed to solve this problem.
And given it can spawn hundreds of thousand of processes (and not threads, it's a share-nothing approach, which is simpler to design), ejabberd is designed as a set of erlang processes (which can be distributed over several servers) :
client connection process
router process
chatroom process
server to server processes
All of them exchanging messages.
route messages given some aspects of the message
Another very lovable feature of erlang is pattern matching.
It is used throughout the language.
For instance, in the following :
access(moderator, _Config)-> rw;
access(participant, _Config)-> rw;
access(visitor, #config{type="public"})-> r;
access(visitor, #config{type="public_rw"})-> rw;
access(_User,_Config)-> none.
That's 5 different versions of the access function.
Erlang will select the most appropriate version given the arguments received. (Config is a structure of type #config which has a type attribute).
That means it is very easy and much clearer than chaining if/else or switch/case to make business rules.
To wrap up
Writing scalable servers, that's the whole point of erlang. Everything is designed it making this easy. On the two previous features, I'd add :
hot code upgrade
mnesia, distributed relational database (included in the base distribution)
mochiweb, on which most http erlang servers are built on
binary support (decoding and encoding binary protocol easy as ever)
a great community with great open source projects (ejabberd, couchdb but also webmachine, riak and a slew of library very easy to embed)
Fewer LOCs
There is also this article from Richard Jones. He rewrote an application from C++ to erlang: 75% fewer lines in erlang.
The list of most common applications for Erlang as been covered (CouchDb, ejabberd, RabbitMQ etc) but I would like to contribute the following.
The reason why it is used in these applications comes from the core strength of Erlang: managing application availability.
Erlang was built from ground up for the telco environment requiring that systems meet at least 5x9's availability (99.999% yearly up-time). This figure doesn't leave much room for downtime during a year! For this reason primarily, Erlang comes loaded with the following features (non-exhaustive):
Horizontal scalability (ability to distribute jobs across machine boundaries easily through seamless intra & inter machine communications). The built-in database (Mnesia) is also distributed by nature.
Vertical scalability (ability to distribute jobs across processing resources on the same machine): SMP is handled natively.
Code Hot-Swapping: the ability to update/upgrade code live during operations
Asynchronous: the real world is async so Erlang was built to account for this basic nature. One feature that contributes to this requirement: Erlang's "free" processes (>32000 can run concurrently).
Supervision: many different strategies for process supervision with restart strategies, thresholds etc. Helps recover from corner-cases/overloading more easily whilst still maintaining traces of the problems for later trouble-shooting, post-mortem analysis etc.
Resource Management: scheduling strategies, resource monitoring etc. Note that the default process scheduler operates with O(1) scaling.
Live debugging: the ability to "log" into live nodes at will helps trouble-shooting activities. Debugging can be undertaken live with full access to any process' running state. Also the built-in error reporting tools are very useful (but sometimes somewhat awkward to use).
Of course I could talk about its functional roots but this aspect is somewhat orthogonal to the main goal (high availability). The main component of the functional nature which contributes generously to the target goal is, IMO: "share nothing". This characteristic helps contain "side effects" and reduce the need for costly synchronization mechanisms.
I guess all these characteristics help extending a case for using Erlang in business critical applications.
One thing Erlang isn't really good at: processing big blocks of data.
We built a betting exchange (aka prediction market) using Erlang. We chose Erlang over some of the more traditional financial languages (C++, Java etc) because of the built-in concurrency. Markets function very similarly to telephony exchanges. Our CTO gave a talk on our use of Erlang at CTO talk.
We also use CouchDB and RabbitMQ as part of our stack.
Erlang comes from Ericsson, and is used within some of their telecoms systems.
Outside telecoms, CouchDb (a document-oriented database) is possibly the best known Erlang application so far.
Why Erlang ? From the overview (worth reading in full):
The document, view, security and
replication models, the special
purpose query language, the efficient
and robust disk layout and the
concurrent and reliable nature of the
Erlang platform are all carefully
integrated for a reliable and
efficient system.
I came across this is in the process of writing up a report: Erlang in Acoustic Ray Tracing.
It's an experience report on a research group's attempt to use Erlang for Acoustic Ray Tracing. They found that while it was easier to write the program, less buggy, etc. It scaled worse, and performed 10x slower than a comparable C program. So one spot where it may not be well suited is CPU intensive scenarios.
Do note though, that the people wrote the paper were in the stages of first learning Erlang, and may not have known the proper development procedures for CPU intensive Erlang.
Apparently, Yahoo used Erlang to make something it calls Harvester. Article about it here: http://www.ddj.com/architect/220600332
What is erlang good for?
http://beebole.com/en/blog/erlang/why-erlang/
http://www.aquabu.com/2008/2/15/erlang-pragmatic-studio-day-3-notes
http://www.reddit.com/r/programming/comments/9q0lr/erlang_and_highfrequency_trading/
(jerf's answer)
It's important to realize that Erlang's 4 parts: the language itself, the VMs(BEAM, hipe) standard libs (plus modules on github, CEAN, etc.) and development environment are being steadily updated / expanded/improved. For example, i remember reading that the floating point performance improved when Wings3d's author realized it needed to improve (I can't find a source for this). And this guy just wrote about it:
http://marian-dan.com/wordpress/?p=324
A couple years ago, Tim Bray's Wide Finder publicity and all the folks starting to do web app frameworks and HTTP servers lead (at least in part) to improved regex and binaries handling. And there's all the work integrating HiPE and SMP, the dialyzer project, multiple unit testing and build libs springing up, ..
So its sweet spot is expanding, The difficult thing is that the official docs can't keep up very well, and the mailing list and erlang blogosphere volume are growing quickly
We are using Erlang to provide the back-end muscle power for our really real-time browser-based multi-player game Pixza. We don't use Flash or any other third-party plugins, though the game is real-time multi-player. We use pure JS and COMET techniques instead. And Erlang supports the "really realtimeliness" of Pixza.
I'm working for wooga, a social game company and we use Erlang for some of our game backends (basically http apis for millions of daily users) and auxiliary services like ios push notification provider, payment etc.
I think it really shines in network related tasks and it makes it kind of straight forward to structure and implement simple and complex network services alike in it. Distribution, fault tolerance and performance are easy to achieve because Erlang already has some of the key ingredients built in and they are being used for a long time in critical production infrastructure. So its not like "the new hip technology thing 0.0.2 alpha".
I know that other game companies use Erlang as well. You should be able to find presentations on slideshare about that.
Erlang draws its strength from being a functional language with no shared memory. Hence IMO, Erlang won't be suitable for applications that require in place memory manipulations. Image editing for example.

Resources