I'm taking another look at graphql, and I'm trying to understand why saving round trips is a benefit to developers. Is the cost of making a request so expensive? I'm coming from a web development background.
Lets compare a standard rest api with a graphql api.
I need to retrieve a users personal info, and a list of their friends. Traditional rest api might require 2 calls, one to get the personal info, and one to get the friends.
With graphql, I get the same results with one request.
But as a frontend developer, I want my page to have the shortest possible stagnant period. I would like to render only a portion of the page as fast as possible, rather than wait for all the information I need in one go to then render the page.
Now from my understanding, graphql was partly created to solve mobile application api issues. Is there something about ios apps that makes it more beneficial to load all the data at once, as opposed to parallel requests? Or is there something else that I'm missing?
So generally speaking, network traffic inside the system you control (i.e. your backend) is fast. A request that may take 205ms (200ms network, 5ms data) to the outside world, might take just 6ms (1ms network, 5ms data) internally.
If you want to build a screen of your app, and that requires making two REST requests because the 2nd depends on the result of the first, you're looking at (given these crude numbers) 410ms to get the data you need to render your screen.
With GraphQL (or any other gateway layer that consolidates the data), you'd get everything in slightly over 212ms (200ms latency to GraphQL server + 6ms for each internal call).
In scenarios where you could make all your requests in parallel (i.e. they don't depend on the data from other requests), the performance benefits aren't quite as apparent, but you'll find that these situations are actually pretty rare as the complexity of your app grows.
The general rule of thumb with GraphQL is that your initial query fetches enough data for the page to be functional, if there's less important content you can always fetch that with another query.
Alongside the performance benefits, letting mobile devices make fewer network requests is a big win when it comes to battery life. Network usage is expensive, and should be avoided as much as possible.
Before using GraphQL instead of REST API one needs to understand the benefits of GraphQL over REST
GraphQL is a query language and
it uses the type system you define GraphQLInt, GraphQLString, customType...
REST
multiple round trips - slow
overloaded data
Many EndPoint
GraphQL
1 Endpoint
declaratives:
types, queries, mutations
exact shape of the data you want in one call
no underfetching, no overfetching of data
query == result, better performance
Related
We are following Embedded Architecture for our S4HANA 1610 system.
Please let me know what will be the impact on Server if we implement 200+ Standard Fiori Apps in our System ?
Regards,
Sayed
When you say “server”, are you referring to the ABAP backend, consisting of one or more SAP application servers and usually one database server?
In this case, you might get a very first impression using transaction ST03.
Here, you get a detailed analysis of resource consumption on the SAP application server.
You also get information about database access times, as seen from the application server.
This can give you a good hint about resource consumption on the database server.
Usually, the ABAP backend is accessed from Fiori via OData calls.
Not every user interaction causes an OData call, some interactions are handled locally at the frontend.
In general, implemented apps only require some space on the hard disk, as long as nobody is using them.
So the important questions for defining the expected workload are:
How many users are working with these apps in which frequency (Avg.
thinktime)?
How many OData calls are sent from these apps to the backend and how
many dialog steps are handled by the frontend itself?
How expensive are these OData calls (see ST03)?
Every app reflects one or more typical business processes, which need to be defined.
Your specific Customizing also plays an important role, because it controls different internal functionality.
It’s also mandatory, to optimize database access, because in productive use, tables might get bigger in size, which might slow down database access over time.
Usually, this kind of sizing is done by SAP Hardware and Technology partners.
I'm in the middle of prototyping a social network (using ROR 3) and decided to check out Neo4j and while it looks great, I have a question about scaling and performance in terms of design.
I've researched how Etsy puts together and activity feed (see http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture ), and understand how messaging queues can fan out activities (such as sharing a picture and making this activity available to your 500 or so friends in their news feed). I also understand how news feeds can be cached (memcache) and how lookups can be performed against Redis..
All in all, it seems that to make a high performance activity feed that scales well (and social network in general) the common pattern is to use sharding, horizontal scaling, memcache, rabbitmq, redis, Mongodb, innodb (mysql) etc - all in attempt to compensate for high volumes, disk reads, etc.. But this is quite a bit of overhead in terms of design..
Can Neo4J eliminate the need, at least early on, for such an arrangement? I mean is it so fast that I don't need to set a message queue for fan outs and messaging, don't need to set up "activities" cache for every action a user performs, and can use it to handle both ordering and storing messaging? Can a news feed like Facebook's be created with such a system, or is the high performance activity feed limited to basic status updates?
If those questions are too broad, let me ask it a different way: Could I write facebook or twitter using neo4j and eliminate the need for message queuing to fan out updates (instead I want to get a live stream of updates on the fly), memcache for newsfeeds, and cached activity feed objects? Or will I find myself doing the same thing or even more to handle hundreds of request per second?
I ask the because it would save quite a bit of time to use Neo4J if it can indeed handle high volumes without having to use the tricks Etsy, Twitter, and Facebook employ to maintain high performance.
Yes. In fact, it's been done already by Rene Pickhardt.
http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/
If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.
Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.
The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?
How does crawler or spider in a search engine works
Specifically, you need at least some of the following components:
Configuration: Needed to tell the crawler how, when and where to connect to documents; and how to connect to the underlying database/indexing system.
Connector: This will create the connections to a web page or a disk share or anything, really.
Memory: The pages already visited must be known to the crawler. This is usually stored in the index but it depends on the implementation and the needs. The content is also hashed for de-duplication and updates validation purposes.
Parser/Converter: Needed to be able to understand the content of a document and extract meta-data. Will convert the extracted data to a format usable by the underlying database system.
Indexer: Will push the data and meta-data to an database/indexing system.
Scheduler: Will plan runs of the crawler. Might need to handle a large number of running crawlers at the same time and take into consideration what is currently being done.
Connection algorithm: When the parser finds links to other documents, it is needed to analyse when, how, and where the next connections must be made. Also, some indexing algorithm take into consideration the page connection graphs so it might be needed to store and sort information related to that.
Policy Management: Some sites requires crawlers to respect certain policies (robots.txt for example).
Security/User Management: The crawler might need to be able to login in some system to access data.
Content compilation/execution: The crawler might need to execute certain things to be able to access what's inside, like applets/plugins.
Crawlers needs to be efficient at working together from different starting points, speed, memory usage and using a high number of threads/processes. I/O is key.
The world wide web is basically a connected directed graph of web documents,images,multimedia files etc. .Each node of the graph is a component of a web page-for example-a web page consists of image,text,video etc, all of them are linked.Crawler traverses the graph using Breadth First Search using links in web pages.
A crawler initially starts with one (or more) seed points.
It scans the webpage and explores the links in that page.
This process continues until all the graph is explored(some predefined constraint can be used to limit search depth).
From How Stuff Works
How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
I am curious about the technology behind a search engine like torrentz.com. From what I could observe, it doesn't host any torrent files, but rather connects you to other servers that do.
you search for keywords, it brings up a list of potential titles matching your search.
then you pick one of these and it provides you with another list of potential servers hosting the corresponding torrent file.
What I'm interested in particularly is the strategy behind gathering and indexing all that content:
How do they collect then aggregate the data?
Is it a submission base service, where each of these servers submits its content for indexing?
Is it a crawling algorithm? If so how do you even start crawling a site like piratebay.org?
Do they have access to these other servers' databases?
My knowledge and understanding of the bittorrent protocol is not very elaborate, but the documentation that I found online pointed me more toward the processes involved in building a tracker service, which isn't exactly what I'm interested in. Any insight and recommended reading material is appreciated.
For beginning start indexing their rss feeds and gather data from it. The next step would be indexing of portal's (like Mininova, tpb, etc) pages but watch out for the fact that you can be banned (ip based) for doing so, since that would provoke huge amount of data requested from their servers (i don't think that they be too happy about that)..
That said i doubt that they have access to other server's databases, but rather it's crawling +rss.
Another thing that you can use is that when somebody make a query of an item which you don't have in qyour database, you make the query on the main bt portal's, cache the result in your db, and then display results. Then if another user make the same query (which is pretty common scenario) you can show him cached data + new data from rss.