Erlang Design Advice regarding HTTP services

Erlang Design Advice regarding HTTP services - erlang

I'm new to Erlang but I would like to get started with an application which feels applicable to the technology due to the concurrency desires I have.
This picture highlights what i want to do.
http://imagebin.org/163917
Where messages are pulled from a queue and routed to worker processes which have previously been setup as a result of a user making some input a form in a Django app. The setup requires some additional database (preexisting database so I don't want to use ETS/DETS for this bit) lookup which then talks to the message router and creates a relevant process.
My issue comes with given that I may want to ask my Django app in the future for all the workers that need to be setup and task them in the first place, what is the best way to communicate here. I favour HTTP/ json and have read up what little I can find on Mochiweb and MochiJson and I think that would do what I want. I was planning on having a OTP supervisor and application, so would it be sensible to have a seperate mochiweb process which then passes erlang messages to the router?
I have struggled a little with mochiweb due to all the tutorials talking about how you use a script to create a directory structure, which seems to put mochiweb centric to a design - which isn't want I want here, I want a lightweight mochiweb process that does occassional work.
Please tear this apart, all comments welcome.
Cheers
Dave

mochiweb is awesome but I think what you actually want is webmachine. The complete documentation is available here and here. In a nutshell, webmachine is a toolkit for making REST applications, which I think is what you want. It uses mochiweb behind the scenes but hides all of the complex (and undocumented) details. When you create a webmachine project you'll get a complete OTP application and a default resource. From there you'll do something like the following:
Add your own resources (or modify + rename the default one).
Modify the dispatcher so your resources and paths make sense for your app.
Add code to create and monitor your worker processes - probably a gen_server and a supervisor. See this and related articles for ideas. Note you'll want to start both under the main supervisor provided to you when you created your project.
Modify your resources to communicate with your gen_server.
I didn't quite follow everything else you are asking - it may be easier to answer any follow-up questions in comments.

Related

How to add data in Datadog to create custom dashboard?

I am new to Datadog APM. I have read few tutorials but I am unable to find how to to add data in Datadog to create custom dashboard?

The first step will be to make sure you have the datadog agent running, and that the APM component of it is running and ready to receive trace data from your applications (this option in your datadog.conf, which must be set to "true").
Second, you'll want to install the appropriate library(ies) for the languages your applications are written in. You can find them all listed in your datadog account on this page: https://app.datadoghq.com/apm/docs
Third, once the trace libraries are installed, you'll want to add trace integrations for the tools you're interested in collecting APM data on, and the instructions for those will be found in each library's docs. (E.g, Python, Ruby, and Go)
The integraitons will be a fairly quick way to get pretty granular spans on where your applications have higher latency, errors, etc. If from there you'd like to go even further, each library's docs also have instructions on how you can write your own custom tracing functions to expose more info on your custom applications--that's a little more work, but is fairly straight-forward. You'll probably want to add those bit-by-bit as you go.
Then you'd be all set, I think. You'll be tracing services, resources to get the latency, request-count, and error-count of your application requests, and you can drill down to the flame-graphs to further understand what requests spend the most time where in your applications.
Looking back now, seems like they made some recent changes to the setup process that makes it even easier to get the web framework and database integrations added if you're using Python. They've even got a command line tool in their get-started section now.
Hope this helps! And reach out to their support team (support#datadoghq.com) if you run into issues along the way--they're always happy to lend a hand.

Erlang supervisor processes

I have been learning Erlang intensively, and after finishing 'Programming Erlang' from Joe Armstrong, there is one thing that I keep coming back to.
In my mind a Supervisor spawns One process per child handler. So each declared gen_server type handler will run as a separate process.
What happens if you are building a tiny web server and you want each requests to be its own process. Do you still conform to OTP principles and use a gen_server somehow (how ?), or do you create your own behaviour?
How does Cowboy handle this for eg. ? Does it still use gen_server ?

tl;dr: I find that trying to figure out the "correct" supervision structure a the beginning of a project is a form of premature optimization.
The "right" way to design your supervision tree depends on what the worker parts of your application are doing. In the case of a web server I would probably first explore something along the lines of:
top supervisor (singular)
data service supervisor (one per service type)
worker pool (all workers under the service sup)
client connection supervisor (one)
connection worker pool (or one per connection, have to play with it to decide)
logical supervisor (as appropriate -- massive variance here, depending on problem domain)
workers or supervisors (as appropriate -- have to explore/know the problem domain to have any idea how this should be structured)
So that's several workers per supervisor type at the lower level. I haven't used Cowboy so I don't know how it is organized. The point I'm trying to make is that while the mechanics of handling data services serving web pages are relatively trivial, the part of the system that actually does the core problem-solving work might not be and this is going to dictate everything interesting about the system.
It is a bad thing to have your problem-solving bits mixed in the same module as your web-displaying or connection handling bits. Ideally you should be able to use the same logic units in a native application, a web application and a networked service without any changes.
Ultimately the answer to whether you should have 1:1 supervisors to workers or 1:n depends on what you're doing and what restart strategy gives you the best balance among recovery to a known consistent state, latency felt by the user, and resource usage.
One of my favorite things about Erlang is that I can start with a naive supervisor structure like the one above, play with it until I see where its not so good, and rather easily switch things around and experiment with alternatives without fundamentally altering my system much. (The same goes for playing with alternative data representations if you write proper abstractions around them.) So first, get something that works in testing. Then load it up and see if you can break it. Then start worrying about the details, after you understand where the problems actually are.

It is a common pattern to spawn one server per client in erlang, You will then use a supervisor using the simple_one_to_one strategy for the children servers. This allows to ask the server to start a server on_demand. Generally this is used when you don't know how many processes you will need, and when the processes are independent (a crash of one process should not impacts the other).
There is a very good information in the site learningyousomeerlang.com (LYSE supervisor chapter). the whole site is worth to read.

Automating Azure VIP Swap

I have an ASP.NET MVC 4 app hosted as an Azure web role. I want to do something that seems like it should be pretty standard: I want to create a function that I can call that initiates a VIP swap and raises and event (or calls a callback) when the VIP Swap operation is done.
Just to add some context to the situation: My website implements a workflow that takes about an hour (or less) to complete. If I want to release a new version of the website code, it's convenient (i.e. much less "backward compatibility" code to write) to first let all of the current users complete the workflow so that the new code doesn't need to deal with data created by the previous version of the code. So a management function in my website would first poke a value into the database that disables new workflows; it would then wait until all current workflows are done; it would then call the "VIP Swap" routine; finally, when the VIP Swap routine signals its completion, it would poke the database value to re-enable new workflows.
I found the Microsoft documentation for how to programmatically initiate a VIP swap here:
http://msdn.microsoft.com/en-us/library/ee460814.aspx
The procedure involves POSTing to a magic URL and including some headers in the POST, then periodically performing a GET to a magic URL and checking the response code.
The more I think about this, the more non-trivial it seems. In addition to the basic complexities of wiring up a background timer and completion notification, I don't know what complexities, if any, I might run into trying to do this stuff in the IIS environment. Can I even perform HTTP operations on a background thread? For that matter, will I run into complications just trying to use any of the half dozen or so different "do things in the background" mechanisms baked into .NET?
Any help or guidance will be greatly appreciated. In particular, I'd be ecstatic if someone could point me at a ready-to-go implementation of this function!

I don't think you will find an easy solution to this as the fabric controller is setup to do some very fancy things without your involvement. Running hour-long workflows on a cloud computing environment, where an instance can be pulled out from underneath you, (with a maximum of 5 minutes from the OnStopping event being called to clean up) requires that you do other work anyway to make sure that all of your tasks complete.
The simple question is "What do you do if an instance goes down when workflows are still running?" Do you restart them or are they lost? If they get lost then you don't care anyway, so killing workflows for an upgrade are equally unimportant. If you re-start them then use that same mechanism to decide whether or not a node is due to be shut down, and distribute the jobs accordingly. This pattern is eerily similar to the Hadoop JobTracker. Don't just run the workflows on any 'ol instance. Submit them to a (job tracker) service that decides what to do. The (job tracker) service can then use the service management API to scale up as many instances as you need running the version that you want, run workflows on the appropriate node, and shut them down when they are no longer needed or are outdated.
Unfortunately this may not be the simple solution that you are looking for, but something in your architecture needs to change, rather than trying to force PaaS to fit with your current approach. Decompose your workloads, create loosely coupled services, design for failure, and a few other cloud/distributed computing practices need to be considered. There is a reason why Hadoop is built the way that it is — and it has a reputation for being able to get work done on a bunch of somewhat unreliable commodity hardware.

How to run a very small amount of code asynchronously?

I'd like to run a small amount of code asynchronously in my rails application. The code logs some known information. I'd like this task to not block the rest of the response from my app. The operation is way too lightweight and frequent for it to be done as a delayed job tasks.
I'm thinking to just use:
Thread.new do
# my logging code
end
and call it a day. Is this achieving what I want it to achieve? Are there any drawbacks?

It may be overkill for your particular usage, but have you considered using some form of Message Queuing Middleware such as STOMP, AMQP, OpenWire or even Jabber?
The basic outline (pseudo-code!) would be:
s = client.create_connection(user,pass,server_ip,port)
s.message_send("Log Message Goes Here")
You would then have at least one "consumer" at the other end of the message queue which would write the log message to a file/database/chatroom/IRC Channel/whatever-you-want-really-it's-all-code... :)
It would also mean that if in future you wanted to hand-off high-intensity processing jobs (invoice generation for example) you would already have the infrastructure in place to do so.
Also if you're looking for a really easy Messaging server, I recommend RabbitMQ - it's written in Erlang (but don't let that put you off!) and is very easy to setup.
This is my first post so I can't post any more than two links, but I've published links to all the technologies mentioned above in a gist # https://gist.github.com/1090372

Aside from handling some baseline resource contention things, this should be sufficient. The database resource comes to mind if you're logging there...a file if you're logging there. But the basic approach is fine I would think. Simpler is better...

How to architect Rails site that can be edited while running?

I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris

I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.

IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt

I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart