BigQueryIO loads not offloading rows to GCS when early trigger occurs

BigQueryIO loads not offloading rows to GCS when early trigger occurs - google-cloud-dataflow

I'm playing around with BigQueryIO write using loads. My load trigger is set to 18 hours. I'm ingesting data from Kafka with a fixed daily window.
Based on https://github.com/apache/beam/blob/v2.2.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L213-L231 it seems that the intended behavior is to offload rows to the filesystem when at least 500k records are in a pane
I managed to produce ~ 600K records and waited for around 2 hours to see if the rows were uploaded to gcs, however, nothing was there. I noticed that the "GroupByDestination" step in "BatchLoads" shows 0 under "Output collections" size.
When I use a smaller load trigger all seems fine. Shouldn't the AfterPane.elementCountAtLeast(FILE_TRIGGERING_RECORD_COUNT)))) be triggered?
Here is the code for writing to BigQuery
BigQueryIO
.writeTableRows()
.to(new SerializableFunction[ValueInSingleWindow[TableRow], TableDestination]() {
override def apply(input: ValueInSingleWindow[TableRow]): TableDestination = {
val startWindow = input.getWindow.asInstanceOf[IntervalWindow].start()
val dayPartition = DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC).print(startWindow)
new TableDestination("myproject_id:mydataset_id.table$" + dayPartition, null)
}
})
.withMethod(Method.FILE_LOADS)
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withSchema(BigQueryUtils.schemaOf[MySchema])
.withTriggeringFrequency(Duration.standardHours(18))
.withNumFileShards(10)
The job id is 2018-02-16_14_34_54-7547662103968451637. Thanks in advance.

Panes are per key per window, and BigQueryIO.write() with dynamic destinations uses the destination as key under the hood, so the "500k elements in pane" thing applies per destination per window.

Related

Why is the server response after a Drag&Drop so large and slow

I'm having some performance issues dragging cards between grids. From a backend perspective, storing the data from the grids after a change takes about 200ms.
But then, when the backend work seems to be done, it takes another 2,5 seconds for the frontend to get the response from the request. The request that's taking so long contact 2 rpc events: grid-drop and grid-dragend.
The response is also unusually large I think. Just to give you an idea, see screenshot ... notice the tiny scrollbar at the right. 🙂
TTFB is 2,42s, download size about half a MB.
Any ideas what's going on here and how I can eliminate this?
I'm using Vaadin 21.0.4, spring boot 2.5.4.
Steps I've taken to optimise performance:
Optimize db query + indexing
Use #cacheable where possible
Implemented the cards using LitElement
This is the drop listener:
ComponentEventListener<GridDropEvent<Task>> dropListener = event -> {
if (dragSource != null) {
// The item ontop or below where the source item is dropped. Used to calculate the index of the newly dropped item(s)
Optional<Task> targetItem = event.getDropTargetItem();
// if the item is dropped on an existing row and the dragged item contains the same items that's being dropped.
if (targetItem.isPresent() && draggedItems.contains(targetItem.get())) {
return;
}
// Add dragged items to the grid of the target room
Grid<Task> targetGrid = event.getSource();
Optional<Room> room = dayPlanningView.getRoomForGrid(targetGrid);
// The items of the target Grid. Using listdataview so this would not retrigger the query
List<Task> targetItems = targetGrid.getListDataView().getItems().toList();
// Calculate the position of the dropped item
int index = targetItem.map(task -> targetItems.indexOf(task)
+ (event.getDropLocation() == GridDropLocation.BELOW ? 1 : 0))
.orElse(0);
room.ifPresent(r -> service.plan(draggedItems, r, index, dayPlanningView.getSelectedDate()));
// send event to update other users
Optional<ScheduleUpdatedEvent> scheduleUpdatedEvent = room.map(r -> new ScheduleUpdatedEvent(PlanningMasterDetailView.this, r.getId()));
scheduleUpdatedEvent.ifPresent(Broadcaster::broadcast);
// remove items from the source grid. using list provider so items can be removed without DB round-trip.
productionOrderGrid.getListDataView().removeItems(draggedItems);
}
};
I'm a bit stuck now, as I'm kinda out of ideas 😦
Thanks

You should use the TemplateRenderer/LitRenderer instead of the ComponentRenderer because the generated server-side components are affecting the performance:
Read more here: https://vaadin.com/blog/top-5-most-common-vaadin-performance-pitfalls-and-how-to-avoid-them

How do I reset the ScanStreamTransformer Accumulator?

I am writing a Flutter application using the BLOC pattern. I'm currently trying to search a database and return a list of results on the screen.
The first search will complete fine. The issue is when I hit the back button and try to do a second search. The first search results aren't cleared out. Instead they remain on the screen and the new search results are placed below them.
In a nutshell here's what I'm doing.(I'm using the RxDart library)
1.) Defining input and output streams:
PublishSubject<SearchRowModel> _searchFetcher =
PublishSubject<BibSearchRowModel>();
BehaviorSubject<Map<int, SearchRowModel>> _searchOutput =
BehaviorSubject<Map<int, SearchRowModel>>();
2.) Piping the streams together in the class constructor.
_searchFetcher.stream.transform(_resultsTransformer()).pipe(_searchOutput);
3.) Adding the results to the fetcher stream
results.searchRows.forEach((SearchRowModel row) {
_searchFetcher.sink.add(row);
});
4.) Using a ScanStreamTransformer to create a map of the results.
_resultsTransformer() {
return ScanStreamTransformer(
(Map<int, SearchRowModel> cache, SearchRowModel row, index) {
cache[index] = row;
return cache;
},
<int, SearchRowModel>{},
);
}
From my debugging, I've found that the code in step 4 appears to be the issue. That cache (or the Accumulator) isn't getting reset between searches. It just keeps adding additional results to the map.
I've yet to find a way of resetting the map in the accumulator / cache that works. I even tried completely destroying the streams and recreating them, but the original search data in the SearchstreamTransformer accumulator still persisted.

How can I track the current number of viewers of an item?

I have an iPhone app, where I want to show how many people are currently viewing an item as such:
I'm doing that by running this transaction when people enter a view (Rubymotion code below, but functions exactly like the Firebase iOS SDK):
listing_reference[:listings][self.id][:viewing_amount].transaction do |data|
data.value = data.value.to_i + 1
FTransactionResult.successWithValue(data)
end
And when they exit the view:
listing_reference[:listings][self.id][:viewing_amount].transaction do |data|
data.value = data.value.to_i + -
FTransactionResult.successWithValue(data)
end
It works fine most of the time, but sometimes things go wrong. The app crashes, people loose connectivity or similar things.
I've been looking at "onDisconnect" to solve this - https://firebase.google.com/docs/reference/ios/firebasedatabase/interface_f_i_r_database_reference#method-detail - but from what I can see, there's no "inDisconnectRunTransaction".
How can I make sure that the viewing amount on the listing gets decremented no matter what?

A Firebase Database transaction runs as a compare-and-set operation: given the current value of a node, your code specifies the new value. This requires at least one round-trip between the client and server, which means that it is inherently unsuitable for onDisconnect() operations.
The onDisconnect() handler is instead a simple set() operation: you specify when you attach the handler, what write operation you want to happen when the servers detects that the client has disconnected (either cleanly or as in your problem case involuntarily).
The solution is (as is often the case with NoSQL databases) to use a data model that deals with the situation gracefully. In your case it seems most natural to not store the count of viewers, but instead the uid of each viewer:
itemViewers
$itemId
uid_1: true
uid_2: true
uid_3: true
Now you can get the number of viewers with a simple value listener:
ref.child('itemViewers').child(itemId).on('value', function(snapshot) {
console.log(snapshot.numChildren());
});
And use the following onDisconnect() to clean up:
ref.child('itemViewers').child(itemId).child(authData.uid).remove();
Both code snippets are in JavaScript syntax, because I only noticed you're using Swift after typing them.

Sales order total different with actual total

Just need to know any one of you experiencing this issue with sales order document in acumatica ERP 4.2,
The header level total is wrong when compared to the total of lines. Is there any way we can recalculate the totals in code as i couldn't find fix from acumatica yet?

If document is not yet closed, you can just modify qty or add/remove line.
If document is closed i do not see any possible ways except changing data in DB.

I am adding my recent experience to this topic in hopes it might help others.
Months ago, I wrote the code shown below anticipating its need when called by RESTful services. It was clearly not needed, and even worse, merely written and forgotten...
The code was from a SalesOrderEntryExt graph extension.
By removing the code block, the doubling of Order Total was resolved.
It's also an example of backing out custom code until finding the problem.
protected void _(Events.RowInserted<SOLine> e, PXRowInserted del)
{
// call the base BLC event handler...
del?.Invoke(e.Cache, e.Args);
SOLine row = e.Row;
if (!Base.IsExport) return;
if (row != null && row.OrderQty > 0m)
{
// via RESTful API, raise event
SOLine copy = Base.Transactions.Cache.CreateCopy(row) as SOLine;
copy.OrderQty = 0m;
Base.Transactions.Cache.RaiseRowUpdated(row, copy);
}
}

How to implement pagination when using amazon Dynamo DB in rails

I want to use amazon Dynamo DB with rails.But I have not found a way to implement pagination.
I will use AWS::Record::HashModel as ORM.
This ORM supports limits like this:
People.limit(10).each {|person| ... }
But I could not figured out how to implement following MySql query in Dynamo DB.
SELECT *
FROM `People`
LIMIT 1 , 30

You issue queries using LIMIT. If the subset returned does not contain the full table, a LastEvaluatedKey value is returned. You use this value as the ExclusiveStartKey in the next query. And so on...
From the DynamoDB Developer Guide.

You can provide 'page-size' in you query to set the result set size.
The response of DynamoDB contains 'LastEvaluatedKey' which will indicate the last key as per the page size. If response does't contain 'LastEvaluatedKey' it means there are no results left to fetch.
Use the 'LastEvaluatedKey' as 'ExclusiveStartKey' while fetching next time.
I hope this helps.
DynamoDB Pagination

Here's a simple copy-paste-run proof of concept (Node.js) for stateless forward/reverse navigation with dynamodb. In summary; each response includes the navigation history, allowing user to explicitly and consistently request either the next or previous page (while next/prev params exist):
GET /accounts -> first page
GET /accounts?next=A3r0ijKJ8 -> next page
GET /accounts?prev=R4tY69kUI -> previous page
Considerations:
If your ids are large and/or users might do a lot of navigation, then the potential size of the next/prev params might become too large.
Yes you do have to store the entire reverse path - if you only store the previous page marker (per some other answers) you will only be able to go back one page.
It won't handle changing pageSize midway, consider baking pageSize into the next/prev value.
base64 encode the next/prev values, and you could also encrypt.
Scans are inefficient, while this suited my current requirement it won't suit all!
// demo.js
const mockTable = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
const getPagedItems = (pageSize = 5, cursor = {}) => {
// Parse cursor
const keys = cursor.next || cursor.prev || [] // fwd first
let key = keys[keys.length-1] || null // eg ddb's PK
// Mock query (mimic dynamodb response)
const Items = mockTable.slice(parseInt(key) || 0, pageSize+key)
const LastEvaluatedKey = Items[Items.length-1] < mockTable.length
? Items[Items.length-1] : null
// Build response
const res = {items:Items}
if (keys.length > 0) // add reverse nav keys (if any)
res.prev = keys.slice(0, keys.length-1)
if (LastEvaluatedKey) // add forward nav keys (if any)
res.next = [...keys, LastEvaluatedKey]
return res
}
// Run test ------------------------------------
const runTest = () => {
const PAGE_SIZE = 6
let x = {}, i = 0
// Page to end
while (i == 0 || x.next) {
x = getPagedItems(PAGE_SIZE, {next:x.next})
console.log(`Page ${++i}: `, x.items)
}
// Page back to start
while (x.prev) {
x = getPagedItems(PAGE_SIZE, {prev:x.prev})
console.log(`Page ${--i}: `, x.items)
}
}
runTest()

I faced a similar problem.
The generic pagination approach is, use "start index" or "start page" and the "page length". 
The "ExclusiveStartKey" and "LastEvaluatedKey" based approach is very DynamoDB specific.
I feel this DynamoDB specific implementation of pagination should be hidden from the API client/UI.
Also in case, the application is serverless, using service like Lambda, it will be not be possible to maintain the state on the server. The other side is the client implementation will become very complex.
I came with a different approach, which I think is generic ( and not specific to DynamoDB)
When the API client specifies the start index, fetch all the keys from
the table and store it into an array.
Find out the key for the start index from the array, which is
specified by the client.
Make use of the ExclusiveStartKey and fetch the number of records, as
specified in the page length.
If the start index parameter is not present, the above steps are not
needed, we don't need to specify the ExclusiveStartKey in the scan
operation.
This solution has some drawbacks -
We will need to fetch all the keys when the user needs pagination with
start index.
We will need additional memory to store the Ids and the indexes.
Additional database scan operations ( one or multiple to fetch the
keys )
But I feel this will be very easy approach for the clients, which are using our APIs. The backward scan will work seamlessly. If the user wants to see "nth" page, this will be possible.

In fact I faced the same problem and I noticed that LastEvaluatedKey and ExclusiveStartKey are not working well especially when using Scan So I solved Like this.
GET/?page_no=1&page_size=10 =====> first page
response will contain count of records and first 10 records
retry and increase number of page until all record come.
Code is below
PS: I am using python
first_index = ((page_no-1)*page_size)
second_index = (page_no*page_size)
if (second_index > len(response['Items'])):
second_index = len(response['Items'])
return {
'statusCode': 200,
'count': response['Count'],
'response': response['Items'][first_index:second_index]
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

BigQueryIO loads not offloading rows to GCS when early trigger occurs - google-cloud-dataflow

Panes are per key per window, and BigQueryIO.write() with dynamic destinations uses the destination as key under the hood, so the "500k elements in pane" thing applies per destination per window.

Related

Why is the server response after a Drag&Drop so large and slow

How do I reset the ScanStreamTransformer Accumulator?

How can I track the current number of viewers of an item?

Sales order total different with actual total

How to implement pagination when using amazon Dynamo DB in rails

Categories

Resources