Reduce HTTP requests in Openlayers - openlayers-3

I have many tile layers in a map.So when I zoom or move the map,all layers will send wms request to the server.How can I reduce HTTP requests when zoom the map by merging different layer's request to a single request?

You can use the maxTilesLoading option to change the number of asynchronous requests Openlayers send, at the same time, when interacting with a map i.e:
var map = new Map({
maxTilesLoading:5,
layers: [
new TileLayer({
source: new OSM()
})
],
target: 'map',
view: new View({
center: [0, 0],
zoom: 2
})
});
The default is 16, but if you have tiles that take a second or two to load you can end up with a big backlog of unfinished requests being processed by the server. This is because browsers can only have around 10-20 active http requests to a server at the same time. Any more and the request will just hang until the queue reduces. This can make the site grind to a halt with increasing number of requests sent from the browser to the server just waiting for the queue of requests to reduce so that is can get processed. Constantly moving/zooming a map will send a large number of http requests from the browser to the server. Reducing this with maxTilesLoading reduces the chances of overloading the number of requests. The trade off is the map might load a little slower at first as more work is being done in series, but it vastly improves the overall performance when interacting with the map.

If all your WMS layers come from the same server, you can combine them into a single ol.layer.Layer object, more precisely in one source object by definining each layer in a comma separated list (under the 'LAYERS' parameter):
new ol.layer.Image({
extent: [-13884991, 2870341, -7455066, 6338219],
source: new ol.source.ImageWMS({
url: 'https://ahocevar.com/geoserver/wms',
params: {'LAYERS': 'topp:states,topp:population'}, // <---
ratio: 1,
serverType: 'geoserver'
})
})
There are other things you could try to reduce the number of requests / to enhance the overall performance:
you mentioned "many tile layers", and then WMS requests. If you're not using a tile caching method, you could look into that. Example: MapCache.
you could also do a single big tile request instead of many small ones to render you layer(s) (the example above does that). This kind of approach reduces the number of requests, but you'll have the impression that the map is slower to render. It also make it less possible to "cache" the result, as the request of the tile is very often unique (depends on the extent of the map view).

Related

Getting Locust to send a predefined distribution of requests per second

I previously asked this question about using Locust as the means of delivering a static, repeatable request load to the target server (n requests per second for five minutes, where n is predetermined for each second), and it was determined that it's not readily achievable.
So, I took a step back and reformulated the problem into something that you probably could do using a custom load shape, but I'm not sure how – hence this question.
As in the previous question, we have a 5-minute period of extracted Apache logs, where each second, anywhere from 1 to 36 GET requests were made to an Apache server. From those logs, I can get a distribution of how many times a certain requests-per-second rate appeared; e.g. there's a 1/4000 chance of 36 requests being processed on any given second, 1/50 for 18 requests to be processed on any given second, etc.
I can model the distribution of request rates as a simple Python list: the numbers between 1 and 36 appear in it an equal number of times as 1–36 requests per second were made in the 5-minute period captured in the Apache logs, and then just randomly get a number from it in the tick() method of a custom load shape to get a number that informs the (user count, spawn rate) calculation.
Additionally, by using a predetermined random seed, I can make the test runs repeatable to within an acceptable level of variation to be useful in testing my API server configuration changes, since the same random list elements should be retrieved each time.
The problem is that I'm not yet able to "think in Locust", to think in terms of user counts and spawn rates instead of rates of requests received by the server.
The question becomes this:
How do you implement the tick() method of a custom load shape in such a way that the (user count, spawn rate) tuple results in a roughly known distribution of requests per second to be sent, possibly with the help of other configuration options and plugins?
You need to create a Locust User with the tasks you want it to run (e.g. make your http calls). You can define time between tasks to kind of control the requests per second. If you have a task to make a single http call and define wait_time = constant(1) you can roughly get 1 request per second. Locust's spawn_rate is a per second unit. Since you have the data you want to reproduce already and it's in 1 second intervals, you can then create a LoadTestShape class with the tick() method somewhat like this:
class MyShape(LoadTestShape):
repro_data = […]
last_user_count = 0
def tick(self):
self.last_user_count = requests_per_second
if len(self.repro_data) > 0:
requests_per_second = self.repro_data.pop(0)
requests_per_second_diff = abs(last_user_count - requests_per_second)
return (requests_per_second, requests_per_second_diff)
return None
If your first data point is 10 requests, you'd need requests_per_second=10 and requests_per_second_diff=10 to make Locust spin up all 10 users in a single second. If the next second is 25, you'd have requests_per_second=25 and requests_per_second_diff=15. In a Load Shape, spawn_rate also works for decreasing the number of users. So if next is 16, requests_per_second=16 and requests_per_second_diff=9.

Circumventing negative side effects of default request sizes

I have been using Reactor pretty extensively for a while now.
The biggest caveat I have had coming up multiple times is default request sizes / prefetch.
Take this simple code for example:
Mono.fromCallable(System::currentTimeMillis)
.repeat()
.delayElements(Duration.ofSeconds(1))
.take(5)
.doOnNext(n -> log.info(n.toString()))
.blockLast();
To the eye of someone who might have worked with other reactive libraries before, this piece of code
should log the current timestamp every second for five times.
What really happens is that the same timestamp is returned five times, because delayElements doesn't send one request upstream for every elapsed duration, it sends 32 requests upstream by default, replenishing the number of requested elements as they are consumed.
This wouldn't be a problem if the environment variable for overriding the default prefetch wasn't capped to minimum 8.
This means that if I want to write real reactive code like above, I have to set the prefetch to one in every transformation. That sucks.
Is there a better way?

How to paginate using Orient 3.0 streams

According to streaming example at http://orientdb.com/docs/3.0.x/java/Java-Query-API.html, we can use the Orient result set streaming API as follows
ODatabaseDocument db;
...
String statement = "SELECT FROM V WHERE name = ? and surnanme = ?";
OResultSet rs = db.query(statement, "John", "Smith");
rs.stream().forEach(x -> System.out.println(x.getProperty("age")));
rs.close();
This is fine but too trivial - what if we need to keep the rs/stream around? We can't very well close the resultset because we want to reuse the stream on a subsequent user request in a web application, say (in scenarios such as paging).
But to keep the streams "alive" the Orient user guide says that:
OResultSet is implemented as a paginated structure, that holds some
iterators open during the iteration. This is true both in remote and
in embedded usage.
You should always invoke OResultSet.close() at the end of the
execution, to free resources.
OResultSet instances are automatically closed when you close the
ODatabase that returned them.
It is important to always close result sets, even when they are
converted to streams (after the stream is consumed).
Are there any best practices around this. As far as I can tell, we would need to:
1) Keep the Orient database connection open until the user "paging" session is done (which could be say 5-10 minutes). Only when the user says "done" can we close the result set & close the database connection. The Orient database connection (and whatever stream it generated) thus becomes "private" to a single application user. Moreover, since every user request can be activated on a different thread, the said database connection would need to be made active on the current thread before using it.
2) Use the Java Stream API to navigate through arbitrary subsets of the "arbitrarily" large resultset. How would memory usage be handled by the underlying Orient db stream implementation? What determines the memory usage for using a "single rs/stream" and keeping it around for a while? What happens when we have thousands of open rs/streams especially if each user has their own "private" rs/stream they're looking at?
3) If a given Orient database connection can only be used on a single thread at a time (an Orient requirement), how do we handle multiple users with their own custom long-lived rs/streams/connections? Does this mean that if we have a 1000 clients using their own private rs/stream (that they hang on to for say 5 minutes), then we have to keep 1000 database connections open (i.e. one for each user/rs?) What are the limits around this? This style is obviously quite different from the more typical execute query/close rs pattern for quick user interaction that is stateless from one request to the next (naive paging that re-executes queries every time for a given range and this can get expensive)
P.S. I realize that once we get a Java stream, then we pretty much start just using the Java API itself - so I suppose that JOOQ streaming usage (for example) would be pretty similar to Orient streaming usage once you start getting into the Stream interfaces - I'm not familiar with the Java Streams API, but I suppose How to paginate a list of objects in Java 8? is a good place to start?
My conclusion is that streaming works well when scrolling through a large result set without consuming a large amount of memory or having to keep re-executing offset/limit queries (similar to forward only scrolling over JDBC resultsets). A typical use case is an export scenario.
For forward and backward paging, in Orient at least, you likely need an indexed property/properties and perform range queries - you'll need to make sure the index is SB-tree so that it supports range queries.
FYI, Solr has a cursor mechanism which works pretty well for forward pagination on sorted results - but if you keep some simple state markers on the client you can also go back to results already encountered. "go to" random pages is not supported in Solr cursors but you can always re-sort/filter on some other criteria in order to move "useful" results to the top of the resultset instead of deep paging (https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html)

restricting WMS tile extents

I'm brand new to openlayers, I've just been reading the docs today since I have a need to do some testing with it. I'd like to know if there's a way in openlayers to restrict the maximum geographic extent in a WMS tile request.
I've read that WMS bounding boxes are generated automagically, which is neat - but I have some issues with WMS requests on very large datasets which tend to make the underlying WMS server struggle. That's not really change-able, so we need to work around it, and one strategy is to request only small subsets at a time (~5 or 10 degree squares at most).
So - is there a way in openlayers to say 'get me at most 5 degrees by 5 degrees in a single WMS request, and build my map layer from those'?
Thanks in advance
Yes, you can set extent for your layer in such way:
new ol.layer.Tile( {
extent: ol.proj.transformExtent([30, 30, 50, 50], "EPSG:4326", "EPSG:3857"),
// your WMS source and other code
Please, make sure to use right projection. Example was given in case your map is in web mercator.

Real-time pipeline feedback loop

I have a dataset with potentially corrupted/malicious data. The data is timestamped. I'm rating the data with a heuristic function. After a period of time I know that all new data items coming with some IDs needs to be discarded and they represent a significant portion of data (up to 40%).
Right now I have two batch pipelines:
First one just runs the rating over the data.
The second one first filters out the corrupted data and runs the analysis.
I would like to switch from batch mode (say, running every day) into an online processing mode (hope to get a delay < 10 minutes).
The second pipeline uses a global window which makes processing easy. When the corrupted data key is detected, all other records are simply discarded (also using the discarded keys from previous days as a pre-filter is easy). Additionally it makes it easier to make decisions about the output data as during the processing all historic data for a given key is available.
The main question is: can I create a loop in a Dataflow DAG? Let's say I would like to accumulate quality-rates given to each session window I process and if the rate sum is over X, some a filter function in earlier stage of pipeline should filter out malicious keys.
I know about side input, I don't know if it can change during runtime.
I'm aware that DAG by definition cannot have cycle, but how achieve same result without it?
Idea that comes to my mind is to use side output to mark ID as malicious and make fake unbounded output/input. The output would dump the data to some storage and the input would load it every hour and stream so it can be joined.
Side inputs in the Beam programming model are windowed.
So you were on the right path: it seems reasonable to have a pipeline structured as two parts: 1) computing a detection model for the malicious data, and 2) taking the model as a side input and the data as a main input, and filtering the data according to the model. This second part of the pipeline will get the model for the matching window, which seems to be exactly what you want.
In fact, this is one of the main examples in the Millwheel paper (page 2), upon which Dataflow's streaming runner is based.

Resources