I have a large dataset (multiple nested json) which I am using to create 4 line charts using highchart. As the volume of data grows I begin to see a higher rendering time, which is normal. I was wondering if there might be a way to render a part of the data (say, 15 days' data), and then render the rest of the data in the background and make the enable the tabs (1m, 3m, 6 m, ...) as the data is rendered.
Thanks,
Sukrit
Yes it is possbile, you need to prepare script which will return you a part of data.
See the exampel of lazy loading http://www.highcharts.com/stock/demo/lazy-loading
Related
I have an UnboundedSource that generates N items (it's not in batch mode, it's a stream -- one that only generates a certain amount of items and then stops emitting new items but a stream nonetheless). Then I apply a certain PTransform to the collection I'm getting from that source. I also apply the Window.into(FixedWindows.of(...)) transform and then group the results by window using Combine. So it's kind of like this:
pipeline.apply(Read.from(new SomeUnboundedSource(...)) // extends UnboundedSource
.apply(Window.into(FixedWindows.of(Duration.millis(5000))))
.apply(new SomeTransform())
.apply(Combine.globally(new SomeCombineFn()).withoutDefaults())
And I assumed that would mean new events are generated for 5 seconds, then SomeTransform is applied to the data in the 5 seconds window, then a new set of data is polled and therefore generated. Instead all N events are generated first, and only after that is SomeTransform applied to the data (but the windowing works as expected). Is it supposed to work like this? Does Beam and/or the runner (I'm using the Flink runner but the Direct runner seems to exhibit the same behavior) have some sort of queue where it stores items before passing it on to the next operator? Does that depend on what kind of UnboundedSource is used? In my case it's a generator of sorts. Is there a way to achieve the behavior that I expected or is it unreasonable? I am very new to working with streaming pipelines in general, let alone Beam. I assume, however, it would be somewhat illogical to try to read everything from the source first, seeing as it's, you know, unbounded.
An important thing to note is that windows in Beam operate on event time, not processing time. Adding 5 second windows to your data is not a way to prescribe how the data should be processed, only the end result of aggregations for that processing. Further, windows only affect the data once an aggregation is reached, like your Combine.globally. Until that point in your pipeline the windowing you applied has no effect.
As to whether it is supposed to work that way, the beam model doesn't specify any specific processing behavior so other runners may process elements slightly differently. However, this is still a correct implementation. It isn't trying to read everything from the source; generally streaming sources in Beam will attempt to read all elements available before moving on and coming back to the source later. If you were to adjust your stream to stream in elements slowly over a long period of time you will likely see more processing in between reading from the source.
I am very new to metal so bear with me as I am transitioning from the ugly state machine calls of OpenGL to modern graphics frameworks. I really want to make sure I understand how everything works and works together.
I have read most of Apples documentation but it does a better job describing the function of individual components than how they come together.
I am trying to understand essentially whether I should have multiple renderPipelines and renderEncoders are needed in my situation.
To describe my pipeline at a high level here is what goes on:
Retrieve the previous frame's contents from an offscreen texture that was rendered to and draw some new contents onto it.
Swith to rendering on the screen. Draw the texture from step 1 to the screen.
Do some post processing (in native resolution).
Draw the UI ontop as quads. (essentailly a repeat of 2)
So in essence there will be the following vertex/fragment shader pairs
Draw the entities (step 1)
Draw quads on a specefied area (step 2 and 4)
Post processing shader 1 (step 3) uses different inputs than D and cant be done in the same shader
Post processing shader 2 (step 3) uses different inputs than C and can't be done in the same shader
There will be the following texture groups
Texture for each UI element
Texture for the offscreen drawing done in step 1
Potentially more offscreen textures will be used in post processing depening on metals preformance
Ultimately my confusions are this:
Q1. Render Pipelines take only one vertex and one fragment function so does this mean I need have 4 render pipelines even though I only have 3 unique steps to my drawing procedure?
Q2. How am I supposed to use multiple pipelines in one encoder? Wouldn't each sucessive call on .setRenderPipelineState override the previous one?
Q3. Would you recommend keeping all of my .setFragmentTexture calls right after creating my encoder or do I need to set those only right before they are needed.
Q4. Is it valid to keep my depthState constant even as I switch between pipelineStates? How do I ensure that my entities on step 1 are rendered with depth but make sure depth information is lost between frames so entities are all on top of the previous contents?
Q5. What do I do with render step 3 where I have two post processing steps? Do those have to be seperate pipelines?
Q6. How can I efficiently build my pipeline knowing that steps 2 and 4 are essentially the same just with different inputs?
I guess it would help me if someone would walk me through what renderPipelineObjects I will need and for what. It would also be useful to understand what some of the renderCommandEncoder commands might look like at a psuedocode level.
Q1. Render Pipelines take only one vertex and one fragment function so does this mean I need have 4 render pipelines even though I only have 3 unique steps to my drawing procedure?
If there are 4 unique combinations of shader functions, then it's not correct that you "only have 3 unique steps to my drawing procedure". In any case, yes, you need a separate render pipeline state object for each unique combination of shader functions (as well as for any other attribute of the render pipeline state descriptor that you need to change).
Q2. How am I supposed to use multiple pipelines in one encoder? Wouldn't each sucessive call on .setRenderPipelineState override the previous one?
When you send a draw method to the render command encoder, that draw command is encoded with all of the relevant current state and written to the command buffer. If you later change the render pipeline state associated with the encoder that doesn't affect previously-encoded commands, it only affects subsequently-encoded commands.
Q3. Would you recommend keeping all of my .setFragmentTexture calls right after creating my encoder or do I need to set those only right before they are needed.
You only need to set them before the draw command that uses them is encoded. Beyond that, it doesn't much matter when you set them. I'd do whatever makes for the clearest, most readable code.
Q4. Is it valid to keep my depthState constant even as I switch between pipelineStates?
Yes, or there wouldn't be separate methods to set them independently. There would be a method to set both.
How do I ensure that my entities on step 1 are rendered with depth but make sure depth information is lost between frames so entities are all on top of the previous contents?
Configure the loadAction for the depth attachment in the render pass descriptor to clear with an appropriate value (e.g. 1.0). If you're using multiple render command encoders, only do this for the first one, of course. Likewise, the render pass descriptor of the last (or only) render command encoder can/should use a storeAction of .dontCare.
Q5. What do I do with render step 3 where I have two post processing steps? Do those have to be seperate pipelines?
Well, the description of your scenario is kind of vague. But, if you want to use a different shader function, then, yes, you need to use a different render pipeline state object.
Q6. How can I efficiently build my pipeline knowing that steps 2 and 4 are essentially the same just with different inputs?
Again, your description is entirely too vague to know how to answer this. In what ways are those steps the same? In what ways are they different? What do you mean about different inputs?
In any case, just do what seems like the simplest, most direct way even if it seems like it might be inefficient. Worry about optimizations later. When that time comes, open a new question and show your actual working code and ask specifically about that.
I have a dataset with potentially corrupted/malicious data. The data is timestamped. I'm rating the data with a heuristic function. After a period of time I know that all new data items coming with some IDs needs to be discarded and they represent a significant portion of data (up to 40%).
Right now I have two batch pipelines:
First one just runs the rating over the data.
The second one first filters out the corrupted data and runs the analysis.
I would like to switch from batch mode (say, running every day) into an online processing mode (hope to get a delay < 10 minutes).
The second pipeline uses a global window which makes processing easy. When the corrupted data key is detected, all other records are simply discarded (also using the discarded keys from previous days as a pre-filter is easy). Additionally it makes it easier to make decisions about the output data as during the processing all historic data for a given key is available.
The main question is: can I create a loop in a Dataflow DAG? Let's say I would like to accumulate quality-rates given to each session window I process and if the rate sum is over X, some a filter function in earlier stage of pipeline should filter out malicious keys.
I know about side input, I don't know if it can change during runtime.
I'm aware that DAG by definition cannot have cycle, but how achieve same result without it?
Idea that comes to my mind is to use side output to mark ID as malicious and make fake unbounded output/input. The output would dump the data to some storage and the input would load it every hour and stream so it can be joined.
Side inputs in the Beam programming model are windowed.
So you were on the right path: it seems reasonable to have a pipeline structured as two parts: 1) computing a detection model for the malicious data, and 2) taking the model as a side input and the data as a main input, and filtering the data according to the model. This second part of the pipeline will get the model for the matching window, which seems to be exactly what you want.
In fact, this is one of the main examples in the Millwheel paper (page 2), upon which Dataflow's streaming runner is based.
I'm trying to write a Rails action to stream data where the resulting CSV / XML / JSON file is much larger than the memory limit for the web server. The tricky part is that each item in the dataset is composed from two sources. One is a Postgres DB where I plan to open a CURSOR (or just use id > Y LIMIT X) to batch process the data. The latter is a custom data store but there is basically a cursor object I can use to batch that as well.
My problem is I'm not sure what the best way to iterate over the second data source is. I imagine I'll need a structure to open the cursor and as I consume the data in each batch I'll load the next batch.
This problem seems like it might have been solved already so I'm hoping there's an established pattern I can use.
I've got a fancy-schmancy "worksheet" style view in a Rails app that is taking way too long to load. (In dev mode, and yes I know there's no caching there, "Completed in 57893ms (View: 54975, DB: 855)") The worksheet is rendered using helper methods, because I couldn't stand maintaining umpteen teeny little partials for the different sorts of rows in the worksheet. Now I'm wondering whether partials might actually be faster?
I've profiled the page load and identified a few cases where object caching will shave a few seconds off, but the profile output suggests that a large chunk of time is spent simply looping through the Worksheet model's constituent objects and appending the string output from the helper. Here's an example of what I'm talking about:
def header_row(wksht)
content_tag(:thead, :class => "ioe") do
content_tag(:tr) do
html_row = []
for i in (0...wksht.class::NUM_COLS) do
html_row << content_tag(:th, h(wksht.column_headings[i].upcase),
:class => wksht.column_classes[i])
end
html_row.join("\n")
end
end
end
OTOH using partials means opening files, spinning off the Ruby interpreter, and in the long run, aggregating a bunch of strings, right? So I'm wondering whether there is another way to speed things up in the helpers. Should I be using something like a stringstream (does that exist in Ruby?), should I get rid of content_tag calls in favor of my own "" string interpolation... I'm willing to write my own performance tests, and share the results, if you have any suggested alternatives to the approach I've already taken.
As it's a fairly complex view (and has an editable version as well), I'd rather not rewrite-and-profile the whole thing more than once. :)
Some related reading:
http://www.viget.com/extend/helpers-vs-partials-a-performance-question/ (old)
http://www.breakingpointsystems.com/community/blog/ruby-string-processing-overhead/
http://blog.purepistos.net/index.php/2008/07/14/benchmarking-ruby-string-interpolation-concatenation-and-appending/
#tadman:
There are row totals and column totals (and more columnar arithmetic), and since they're not all just totals, but also depend on other "magic numbers" from the database, I implemented them in the Ruby code rather than Javascript. (DRY and unit testable.) Javascript is used only in the edit view, and just to add/delete rows (client side only) and to fetch a sheet with fresh totals when the cell contents change. It fetches the whole table because nearly half of the values get updated when an input cell changes.
The worksheet and its rows are actually virtual models; they don't live in the DB, but rather aggregate a boatload of real AR objects. They get created every time a view renders (but that takes 1.7 secs in dev mode, so I'm not worried about it).
I suppose I could transmit a matrix of numbers, rather than marked-up content, and have JS unpack it into the sheet. But that gets unmaintainable fast.
I ended up reading an excellent article at http://www.infoq.com/articles/Rails-Performance ("A Look At Common Performance Problems In Rails"). Then I followed the author's suggestion to cache computations during request processing:
def estimated_costs
#estimated_costs ||=
begin
# tedious vector math
end
end
Because my worksheet does stuff like the above over and over, and then builds on those results to calculate some more rows, this resulted in a 90% speedup right off the bat. Should have been plain as day, but it started with just a few totals, then I showed the prototype to the customer, and it snowballed from there :)
I also wondered whether my array-based math might be inefficient, so I replaced the Ruby Arrays of numbers with NArray (http://narray.rubyforge.org/). The speedup was negligible but the code's cleaner, so it's staying that way.
Finally, I put some object caching in place. The "magic numbers" in the database only change a few times a year at most, and some of them are encrypted, but they need to be used in most of the calculations. That's low-hanging fruit ripe for caching, and it shaved off another 1.25 seconds.
I'll look at eager loading of associations next, as there's probably some time to save there, and I'll do a quick comparison of sending "just the data" vs sending the HTML, as #tadman suggested. About the only partial I can cache is the navigation sidebar. All of the other content depends on the request parameters.
Thanks for your suggestions!
Internally all partials are translated into a block of executable Ruby code and run through exactly the same runtime as any helper methods. Periodically you can see glimpses of this when a malformed template causes the generated code to fail to compile.
Although it stands to reason that helper methods are faster than partials, and a straightforward string interpolation is faster still, it's hard to say if the performance gain from this would make it worth pursuing. Rendering a very large number of partials can be a bottleneck in terms of logging in the development environment, but in a production environment their impact seems less severe.
The only way to figure this one out is to benchmark your pages using two different rendering methods.
As you point out, caching is where you get the big gains. Using Memcached to save large chunks of pre-rendered HTML content can give you exponentially faster load times. Rendering 10,000 rows into HTML will always be slower than retrieving the same snippet from the Rails.cache subsystem.
It's also the case that the content you don't render is always rendered the quickest, so anything you can do to reduce the amount of content you generate for each helper call will provide big gains. If you're building a large spread-sheet style app that's entirely dependent on JavaScript, you may find that bundling up the data as a JSON array and expanding it client-side is significantly faster than unrolling the HTML on the server and shipping it over that way.