To go API or not [closed] - ruby-on-rails

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My company has this Huge Database that gets fed with (many) events from multiple sources, for monitoring and reporting purposes. So far, every new dashboard or graphic from the data is a new Rails app with extra tables in the Huge Database and full access to the database contents.
Lately, there has been an idea floating around of having external (as in, not our company but sister companies) clients to our data, and it has been decided we should expose a read-only RESTful API to consult our data.
My point is - should we use an API for our own projects too? Is it overkill to access a RESTful API, even for "local" projects, instead of direct access to the database? I think it would pay off in terms of unifying our team's access to the data - but is it worth the extra round-trip? And can a RESTful API keep up with the demands of running 20 or so queries per second and exposing the results via JSON?
Thanks for any input!

I think there's a lot to be said for consistency. If you're providing an API for your clients, it seems to me that by using the same API you'll understand it better wrt. supporting it for your clients, you'll be testing it regularly (beyond your regression tests), and you're sending a message that it's good enough for you to use, so it should be fine for your clients.
By hiding everything behind the API, you're at liberty to change the database representations and not have to change both API interface code (to present the data via the API) and the database access code in your in-house applications. You'd only change the former.
Finally, such performance questions can really only be addressed by trying it and measuring. Perhaps it's worth knocking together a prototype API system and studying it under load ?

I would definitely go down the API route. This presents an easy to maintain interface to ALL the applications that will talk to your application, including validation etc. Sure you can ensure database integrity with column restrictions and stored procedures, but why maintain that as well?
Don't forget - you can also cache the API calls in the file system, memory, or using memcached (or any other service). Where datasets have not changed (check with updated_at or etags) you can simply return cached versions for tremendous speed improvements. The addition of etags in a recent application I developed saw HTML load time go from 1.6 seconds to 60 ms.
Off topic: An idea I have been toying with is dynamically loading API versions depending on the request. Something like this would give you the ability to dramatically alter the API while maintaining backwards compatibility. Since the different versions are in separate files it would be simple to maintain them separately.

Also if you use the Api internally then you should be able to reduce the amount of code you are having to maintain as you will just be maintaining the API and not the API and your own internal methods for accessing the data.

I've been thinking about the same thing for a project I'm about to start, whether I should build my Rails app from the ground up as a client of the API or not. I agree with the advantages already mentioned here, which I'll recap and add to:
Better API design: You become a user of your API, so it will be a lot more polished when you decided to open it;
Database independence: with reduced coupling, you could later switch from an RDBMS to a Document Store without changing as much;
Comparable performance: Performance can be addressed with HTTP caching (although I'd like to see some numbers comparing both).
On top of that, you also get:
Better testability: your whole business logic is black-box testable with basic HTTP resquest/response. Headless browsers / Selenium become responsible only for application-specific behavior;
Front-end independence: you not only become free to change database representation, you become free to change your whole front-end, from vanilla Rails-with-HTML-and-page-reloads, to sprinkled-Ajax, to full-blown pure javascript (e.g. with GWT), all sharing the same back-end.
Criticism
One problem I originally saw with this approach was that it would make me lose all the amenities and flexibilities that ActiveRecord provides, with associations, named_scopes and all. But using the API through ActveResource brings a lot of the good stuff back, and it seems like you can also have named_scopes. Not sure about associations.
More Criticism, please
We've been all singing the glories of this approach but, even though an answer has already been picked, I'd like to hear from other people what possible problems this approach might bring, and why we shouldn't use it.

Related

What is the best way to integrate data from ProcessMaker?

The question is relatively plain, but mainly directed to the ProcessMaker experts.
I need to extract batches of data from ProcessMaker to perform analysis later.
Currently, we have v3.3 which has database model documented very well, and not so well documented REST API.
Having no clue on the best approach I suggest Process maker developers are encouraged to use direct database connection to fetch data batches.
However, from the perspective of the v.4 upgrade, I see that the database model is no longer a part of the official documentation, as well as the "Data Integration" chapter. Everything points out to use REST API for any data affairs.
So, I am puzzled. Which way to go for v3.3 and v4? REST API or direct DB connection?
ProcessMaker 4 was designed and built as an API first application. The idea is that everything that can and should be done through the application should be done via the API. In fact, this is the way all modern systems are designed. The days of accessing the database directly are gone and for good reason. The API is a contract. It is a contract that says that if you make a request in a certain way, you will get a certain response. On the other hand, we cannot guarantee that the database itself will always have the same tables. As a result if you access the database directly, and then we decide to change the database structure, you will be out of luck and anything you built that access the database directly will potentially fail.
So - the decision is clear. V4 is a modern architecture built with modern tooling. It performs and scales better than V3. It is the future of ProcessMaker. So, we highly recommend using this versioning, upgrading and staying on our mainline, and using the API for all activities related to the data models.

When creating an end to end iOS app, should the front end or back end be created first? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've been a developer for 10 years (non iOS), and working for a large company have never created applications end to end. Just worked on very large applications, on pieces.
I'm starting to get into iOS for fun, and have an app in my head that I want to create. I've wireframed the entire thing using the iOS app 'interface'. Since then, I've begun to start coding. I have about 15 scenes in storyboard (the total app will probably be 100+), and right now I'm just using hardcoded 'fake' data.
However, I've recently begun to think that maybe I should be creating the database and some initial data there instead of using all this hardcoded fake data.
Does anyone have any suggestions and reasons why one way is better than another?
Should I create the back end before the front end? If I do, then each new scene I add I can work the real data in from the beginning instead of having to replace fake hard coded data.
Also, I know little about creating back ends. The application I'm creating is nothing like twitter, but for data access and for this example, let's say it is. It's main view of the application is something like twitter. The user can hit refresh and get many new data points ('tweets' in twitter) from the server. So the application could be very data intensive. Am I best off using something like Parse and paying for their services, or creating something in LAMP, or something else. I've worked with SQL and database a lot in my last 10 years and am very comfortable with that aspect of the back end.
Thoughts? Suggestions?
Thanks!
I'd say you have 3 options here :
Front-end first, back-end afterwards
Good thing is while developing your front-end, you may understand what's really relevant and what isn't. You probably won't do anything unnecessary on the back-end part. A bad thing though is that bad stuff may happen when you try to connect you back-end to your front-end, and involve some code refactoring on the front-end side, if you don't make sure they at least can work together.
Back-end first, front-end afterwards
You may here not really see where you're going while developing the back-end. You'll see (you may even know it already) that what you'll create for the client-side may not really be as it looked in your head.. You'll probably have to rework a lot on the back-end.
Front-and-back-end together
This is how I usually work. Start the front-end just as you did with hard-coded data, and start asap to work on the back-end. Move your boilerplate data on it, just so you can make sure they communicate well. Then, try to work on both simultaneously. That way, if you change your mind about something on one side, you won't have to redo much code on the other side.
Regarding the back-end solution, pretty much all I can say is that I used Parse.com services, and it's really good. In my case, I was not ready to create an entire back-end by myself. If you can, maybe you don't need them. But, (and it's a big one), Parse's SDK can take care of the whole communication between your back-end and your front-end. You don't have to manage network availability, caching stuff, and every thing you have to think about when you develop for a mobile OS. This is very nice.
Their free plan lets you run 1M queries every month, which is quite a lot. But if you want to go further and reduce the number of requests to Parse, you can combine your own back-end and theirs. It may not work for your specific case, but you can have the user access your server to check if new data is available, and only then query Parse. For example, for a news app, have the news on parse.com, store the most recent news date on your server, save the last update date on the client device, and before accessing parse, compare the dates with your server. If needed, query parse, if not, go to the cache (handled by parse's SDK). That way you can limit the number of queries and stay in the free plan.
You should probably try to estimate the number of queries you'll have per month and the monetary impact before choosing.
Just my own opinion :]
I would suggest you to add new features to your app with the smallest possible complexity. Like e. g. "The user can see a list of all registered users." - This example might not fit perfectly well for your case, but I hope you get the point: build one small thing at a time.
But for these small things: make the full trip front and back. Since it shouldn't take you too long to complete such a feature, it doesn't really matter if you complete the frontend or the backend first. So for this part: basically what #rdurand already said ;)
Regarding the backend I see two options:
Either you create some REST-Services yourself. The choice of technilogy should depend on what you know already. I am a big fan of JAX-RS, but if you don't already have some java experience you might have hard time with this.
Use some kind of SAAS-API. I've heard some good things about http://www.apiomat.com/, but never used it myself...
Good luck ;)

Read-only web apps with Rails/Sinatra

I am working on an administrative web app in Rails. Because of various implementation details that are not really relevant, the database backing this app will have all of the content needed to back another separate website. It seems like there are two obvious options:
Build a web app that somehow reads from the same database in a read-only fashion.
Add a RESTful API to the original app and build the second site in such a way as for it to take its content from the API.
My question is this: are either of these options feasible? If so, which of them seems like the better option? Do Rails, Sinatra, or any of the other Rack-based web frameworks lend themselves particularly well to this sort of project? (I am leaning towards Sinatra because it seems more lightweight than Rails and I think that my Rails experience will carry-over to it nicely.)
Thanks!
Both of those are workable and I have employed both in the past, but I'd go with the API approach.
Quick disclaimer: one thing that's not clear is how different these apps are in function. For example, I can imagine the old one being a CRUD app that works on individual records and the new one being a reporting app that does big complicated aggregation queries. That makes the shared DB (maybe) more attractive because the overlap in how you access the data is so small. I'm assuming below that's not the case.
Anyway, the API approach. First, the bad:
One more dependency (the old app). When it breaks, it takes down both apps.
One more hop to get data, so higher latency.
Working with existing code is less fun than writing new code. Just is.
But on the other hand, the good:
Much more resilient to schema changes. Your "old" app's API can have tests, and you can muck with the database to your heart's content (in the context of the old app) and just keep your API to its spec. Your new app won't know the difference, which is good. Abstraction FTW. This the opposite side of the "one more dependency" coin.
Same point, but from different angle: in the we-share-the-database approach, your schema + all of SQL is effectively your API, and it has two clients, the old app and the new. Unless your two apps are doing very different things with the same data, there's no way that's the best API. It's too poorly defined.
The DB admin/instrumentation is better. Let's say you mess up some query and hose your database. Which app was it? Where are these queries coming from? Basically, the fewer things that can interact with your DB, the better. Related: optimize your read queries in one place, not two.
If you used RESTful routes in your existing app for the non-API actions, I'm guessing your API needs will have a huge overlap with your existing controller code. It may be a matter of just converting your data to JSON instead of passing it to a view. Rails makes it very easy to use an action to respond to both API and user-driven requests. So that's a big DRY win if it's applicable.
What happens if you find out you do want some writability in your new app? Or at least access to some field your old app doesn't care about (maybe you added it with a script)? In the shared DB approach, it's just gross. With the other, it's just a matter of extending the API a bit.
Basically, the only way I'd go for the shared DB approach is that I hated the old code and wanted to start fresh. That's understandable (and I've done exactly that), but it's not the architecturally soundest option.
A third option to consider is sharing code between the two apps. For example, you could gem up the model code. Now your API is really some Ruby classes that know how to talk to your database. Going even further, you could write a Sinatra app and mount it inside of the existing Rails app and reuse big sections it. Then just work out the routing so that they look like separate apps to the outside world. Whether that's practical obviously depends on your specifics.
In terms of specific technologies, both Sinatra and Rails are fine choices. I tend towards Rails for bigger projects and Sinatra for smaller ones, but that's just me. Do what feels good.

How to prevent hackers from scraping our database? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do you stop scripters from slamming your website hundreds of times a second?
I am building a web application in RubyOnRails, which is based on a large body of data. The application makes for powerful navigation and intersection of the data, as well as a community model for adding more data.
In that respect one could compare it with StackOverflow.com: a big bunch of data, structured in a fairly simple way.
I intend to offer the content under a CreativeCommons license, but if the site "hits it off", I need to discourage copycats. My biggest fear is screen scraping scripters, not only leeching away the raw data, but also incurring huge usage peaks on my servers.
I wonder if RubyOnRails offers any way to throttle (obviously automated) requests, e.g. to reduce their response-time at the benefit of regular users. Perhaps this requires Apache or Phusion Passenger settings?
EDIT: My target is not to recognize user types, but to reduce responsiveness to overly active users, e.g. maximize the number of requests handled per IP address per unit of time (?)
My suggestion would be to limit any easy iterative navigation of your websites which was the primary way I have seen harvesting programs work. The simple encryption of your id numbers used as GET variables would make stripmining your info more difficult. You can only try and make getting your information onerous. You won't be able to prevent it completely.
You could present a captcha to the "overly active users", just like SO does when you edit too fast. That should effectively hinder automatic spider like scraping.
You might also want to look into using some Rack middleware to do rate limiting, like this recent article covered for doing API limiting (such as what you'd want at Twitter or similar).
I believe all you could do is put hoops for the user to jump though. Ultimately there is no foolproof way to distinguish a regular user from a bot.

Ruby on Rails scalability/performance? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have used PHP for awhile now and have used it well with CodeIgniter, which is a great framework. I am starting on a new personal project and last time I was considering what to use (PHP vs ROR) I used PHP because of the scalability problems I heard ROR had, especially after reading what the Twitter devs had to say about it. Is scalability still an issue in ROR or has there been improvements to it?
I would like to learn a new language, and ROR seems interesting. PHP gets the job done but as everyone knows its syntax and organization are fugly and it feels like one big hack.
To expand on Ryan Doherty's answer a bit...
I work in a statically typed language for my day job (.NET/C#), as well as Ruby as a side thing. Prior to my current day job, I was the lead programmer for a ruby development firm doing work for the New York Times Syndication service. Before that, I worked in PHP as well (though long, long ago).
I say that simply to say this: I've experienced rails (and more generally ruby) performance problems first hand, as well as a few other alternatives. As Ryan says, you aren't going to have it automatically scale for you. It takes work and immense amounts of patience to find your bottlenecks.
A large majority of the performance issues we saw from others and even ourselves were dealing with slow performing queries in our ORM layer. We went from Rails/ActiveRecord to Rails/DataMapper and finally to Merb/DM, each iteration getting more speed simply because of the underlying frameworks.
Caching does amazing wonders for performance. Unfortunately, we couldn't cache our data. Our cache would effectively be invalidated every five minutes at most. Nearly every single bit of our site was dynamic. So if/when you can't do that, perhaps you can learn from our experience.
We had to end up seriously fine tuning our database indexes, making sure our queries weren't doing very stupid things, making sure we weren't executing more queries than was absolutely necessary, etc. When I say "very stupid things", I mean the 1 + N query problem...
# 1 query
Dog.find(:all).each do |dog|
# N queries
dog.owner.siblings.each do |sibling|
# N queries per above N query!
sibling.pets.each do |pet|
# Do something here
end
end
end
DataMapper is an excellent way to handle the above problem (there are no 1 + N problems with it), but an even better way is to use your brain and stop doing queries like that. When you need raw performance, most of the ORM layers won't easily handle extremely custom queries, so you might as well hand write them.
We also did common sense things. We bought a beefy server for our growing database, and moved it off onto it's own dedicated box. We also had to do TONS of processing and data importing constantly. We moved our processing off onto its own box as well. We also stopped loading our entire freaking stack just for our data import utilities. We tastefully loaded only what we absolutely needed (thus reducing memory overhead!).
If you can't tell already... generally, when it comes to ruby/rails/merb, you have to scale out, throwing hardware at the problem. But in the end, hardware is cheap; though that's no excuse for shoddy code!
And even with these difficulties, I personally would never start projects in another framework if I can help it. I'm in love with the language, and continually learn more about it every day. That's something that I don't get from C#, though C# is faster.
I also enjoy the open source tools, the low cost to start working in the language, the low cost to just get something out there and try to see if it's marketable, all the while working in a language that often times can be elegant and beautiful...
In the end, it's all about what you want to live, breathe, eat, and sleep in day in and day out when it comes to choosing your framework. If you like Microsoft's way of thinking, go .NET. If you want open source but still want structure, try Java. If you want to have a dynamic language and still have a bit more structure than ruby, try python. And if you want elegance, try Ruby (I kid, I kid... there are many other elegant languages that fit the bill. Not trying to start a flame war.)
Hell, try them all! I tend to agree with the answers above that worrying about optimizations early isn't the reason you should or shouldn't pick a framework, but I disagree that this is their only answer.
So in short, yes there are difficulties you have to overcome, but the elegance of the language, imho, far outweighs those shortcomings.
Sorry for the novel, but I've been there and back with performance issues. It can be overcome. So don't let that scare you off.
RoR is being used with lots of huge websites, but as with any language or framework, it takes a good architecture (db scaling, caching, tuning, etc) to scale to large numbers of users.
There's been a few minor changes to RoR to make it easier to scale, but don't expect it to scale magically for you. Every website has different scaling issues, so you'll have to put in some work to make it scale.
Develop in the technology that is going to give your project the best chance of success - quick to develop in, easy debugging, easy deployment, good tools, you know it inside out (unless the point is to learn a new language), etc.
If you get tens of million of uniques a month you can always hire in a couple of people and rewrite in a different technology if you need to as ...
... you'll be rake-ing in the cache (sorry - couldn't resist!!)
First of all, it would perhaps make more sense to compare Rails to
Symfony, CodeIgniter or CakePHP, since Ruby on Rails is a complete web application
framework. Compared to PHP or PHP frameworks, Rails applications offer
the advantages that they are small, clean, and readable. PHP is perfect
for small, personal pages (originally it stood for "Personal Home Page"),
while Rails is a full MVC framwork which can be used to build large
sites.
Ruby on Rails has not a larger scalability issue than comparable PHP frameworks.
Both Rails and PHP will scale well if you have only a moderate number
of users (10,000-100,000) which operate on a similar number of objects.
For a few thousand users a classic monolithic architecture will
be sufficient. With a bit of M&M (Memcached and MySQL) you can also
handle millions of objects. The M&M architecture uses a MySQL server to
handle writes and Memcached to handle high read loads. The traditional
storage pattern, a single SQL server using normalized relational tables
(or at best a SQL Master/Multiple Read Slave setup), no longer works
for very large sites.
If you have billions of users like Google, Twitter and Facebook, then
probably a distributed architecture will be better. If you really want to
scale your application without limit, use some kind of cheap commodity hardware
as a foundation, divide your application into a set of services, keep
each component or service scalable itself (design every component as
a scalable service), and adapt the architecture to your application.
Then you will need suitable scalable datastores like NoSQL databases
and distributed hash tables (DHTs), you will need sophisticated map-reduce
algorithms to work with them, you will have to deal with SOA, external
services, and messaging. Neither PHP nor Rails offer a magic bullet here.
What is breaks down to with RoR is that unless you're in Alexa's top 100, you will not have any scalability problems. You'll have more issues with stability on shared hosting unless you can squeeze Phusion, Passenger, or Mongrel out.
Take a little while to look at the problems the Twitter people had to deal with, then ask yourself if your app is going to need to scale to that level.
Then build it in Rails anyway, because you know it makes sense. If you get to Twitter-level volumes then you'll be in the happy position of considering performance optimisaton options. At least you'll be applying them in a nice language!
You can't compare PHP and ROR, PHP is a scripting language as Ruby, and Rails is a framework as CakePHP.
Stated that, I strongly suggest you Rails, because you will have an application strictly organized in MVC pattern, and this is a MUST for your scalability requirement. (Using PHP you had to take care about the project organization on your own).
But for what about scalability, Rails it's not just MVC: For instance, you can start to develop your application with a database, changing it on road without any effort (in the most part of cases), so we can state that a Rails application is (almost) database indipendent because it's ORM (that allow you to avoid database query), you can do a lot of other stuff. (take a look to this video http://www.youtube.com/watch?v=p5EIrSM8dCA )
Just wanted to add some more info to Keith Hanson's smart point about 1 + N problem where he states:
DataMapper is an excellent way to handle the above problem (there are no 1 + N problems with it), but an even better way is to use your brain and stop doing queries like that. When you need raw performance, most of the ORM layers won't easily handle extremely custom queries, so you might as well hand write them.
Doctrine is one of the most popular ORM's for PHP. It addresses this 1 + N complexity problem intrinsic to ORMs by providing a language called Doctrine Query Language (DQL). This allows you to write SQL like statements that use your existing model relationships. e.g
$q = Doctrine_Query::Create()
->select(*)
->from(ModelA m)
->leftJoin(m.ModelB)
->execute()
I'm getting the impression from this thread that the scalability issues of ROR come down primarily to the mess that ORMs are in with regard to loading child objects - ie the '1+N' problem mentioned above. In the above example that Ryan gave with dogs and owners:
Dog.find(:all).each do |dog|
#N queries
dog.owner.siblings.each do |sibling|
#N queries per above N query!!
sibling.pets.each do |pet|
#Do something here
end
end
end
You could actually write a single sql statement to get all that data, and you could also 'stitch' that data up into the Dog.Owner.Siblings.Pets object heirarchy of your custom-written objects. But could someone write an ORM that did that automatically, so that the above example would incur a single round-trip to the DB and a single SQL Statement, instead of potentially hundreds? Totally. Just join those tables into one dataset, then do some logic to stitch it up. It's a bit tricky to make that logic generic so it can handle any set of objects but not the end of the world. In the end, tables and objects only relate to each other in one of three categories (1:1, 1:many, many:many). It's just that no one ever built that ORM.
You need a syntax that tells the system upfront what children you want to load for this particular query. You can sort of do this with the 'eager' loading of LinqToSql (C#), which is not a part of ROR, but even though that results in one round trip to the DB, it's still hundreds of separate SQL statements the way it has currently been set up. It's really more about the history of ORMs. They just got started down the wrong path with that and never really recovered in my opnion. 'Lazy loading' is the default behavior of most ORMs, ie incurring another round trip for every mention of a child object, which is crazy. Then with 'eager' loading - loading the children upfront, that is set up statically in everything I am aware outside of LinqToSql - ie which children always load with certain objects - as if you would always need the same children loaded when you loaded a collection of Dogs.
You need some kind of strongly-typed syntax saying that this time I want to load these children and grandchilren. Ie, something like:
Dog.Owners.Include()
Dog.Owners.Siblings.Include()
Dog.Owners.Siblings.Pets.Include()
then you could issue this command:
Dog.find(:all).each do |dog|
The ORM system would know what tables it needs to join, then stitch up the resulting data into the OM heirarchy. It's true that you can throw hardware at the current problem, which I'm generally in favor of, but it's no reason the ORM (ie Hibernate, Entity Framework, Ruby ActiveRecord) shouldn't just be better written. Hardware really doesn't bail you out of an 8 round-trip, 100-SQL statement query that should have been one round trip and one SQL statement.

Resources