Best practice of solving race issue with pubsub architecture - amazon-sqs

Suppose that I have this microservices architecture and they all interact using pubsub.
Basically, if a car sale is made, it will do task 1, which leads to task 2a & 2b, and if leather seat is available, task 3 will be executed.
Note that task 2a & 2b happens at the same time since its a pubsub
In theory, this works perfectly, however in practice, I am finding cases where task 2b & 3 is executed before 2a is executed. This causes task 3 to cause errors since CAR 123 has not yet been made from task 2a.
I am thinking of adding a delay when task 2b is being executed, but I was wondering whether there is a "more standard" way of solving this type of issue ?
(And no, we can't change task 3 to upsert command)

To get this to work as you want, you can have Sales and the Leather Seat service publish to the same topic.
If the topic has one partition, then Sales message will always be before the message produced from Seat Service.
If the topic has multiple partitions, set the same value(some sort of ID, uuid preferably, to not create hot partitions) as the partition key on both messages, so both messages can be produced on the same partition and still maintain the order you need.

Related

Issue with implementing synchronous operation in activity diagram

I want to draw activity diagram to test a specific phone. I have tried to start with synchronous operation on choosing 3 random phone . can you please check if I have done them correctly , just started learning this process and couldn't figure out if I have to use synchronous or loop for choosing 3 random phone as well
To that I am choosing 3 same type phone randomly for the test.
Initially I check for packaging damage - if it had damage then return to production manager
If no package damage - then open box and check for physical damage - if damage found then return to production manger
if no physical damage then test for functionality damage - if no damage then consider the product as Test passed.
The first horizontal bar is a fork node, meaning that the control flow is split into three concurrent threads. So A1, A2, and A3 are three actions that are executed in parallel. The second horizontal bar is a join node, meaning that once that all of these actions have been finished, the control flow will continue (with the first decision).
I think that this is not what you want to show: The first action "Randomly take 3 samples" already gives you three samples and you would process each one of them in the same way, so you should rather have a loop with three iterations around the part with the decisions.
In general, try distinguish between actions, objects and states - A1, A2, A3 seem to be objects and not actions, and "Test passed" is the resulting state and not an action either.
The part with the three decisions itself is fine (except that personally I like to draw the control flow either from top to bottom or from left to right, but not back - but this is a matter of taste).
Here is an introduction to activity diagrams with some good examples.

How does Locust provide state over time for load testing?

I was trying to move from Gatling to Locust (Python is a nicer language) for load tests. In Gatling I can get data for a chart like 'Requests per seconds over time', 'Response time percentiles over time', etc. ( https://gatling.io/docs/2.3/general/reports/ ) and the really useful 'Responses per second over time'
In Locust I can see the two report (requests, distribution), where (if I understand it correctly), 'Distribution' is the one that does 'over time'? But I can't see where things started failing, or the early history of that test.
Is Locust able to provide 'over time' data in a CSV format (or something else easily graph-able)? If so, how?
Looked through logs, can output the individual commands, but it would be a pain to assemble them (it would push the balance toward 'just use Gatling')
Looked over https://buildmedia.readthedocs.org/media/pdf/locust/latest/locust.pdf but not spotting it
I can (and have) created a loop that triggers the locust call at incremental intervals
increment_user_count = [1, 10, 100, 1000]
# for total_users in range(user_min, user_max, increment_count):
for users in increment_user_count:
[...]
system(assembled_command)
And that works... but it loses the whole advantage of setting a spawn rate, and would be painful for gradually incrementing up to a large number (then having to assemble all the files back together)
Currently executing with something like
locust -f locust_base_testing.py --no-web -c 1000 -r 2 --run-time 8m30s --only-summary --csv=output_stats_20190405-130352_1000
(need to use this in automation so Web UI is not a viable use-case)
I would expect a flag, in the call or in some form of setup, that outputs the summary at regular ticks. Basically I'd expect (with no-web) to get the data that I could use to replicate the graph the web version seems to know about:
Actual: just one final summary of the overall test (and logs per individual call)

time trigger + multiple events in Pubsub

I am ingesting 2 different datasets on GCS. Lets say I write a event e1 and event e2 respectively in pubsub which happens at a different times.
I want to start a job and 9 AM and check when both events e1 and e2 have happened for that particular day (After 9 AM) then i kick a process to generate another dataset from these 2 datasets.
Is Cloud composer right to build this kind of requirement. If yes, then please provide some guidance how it can be done
Cloud Composer (and Airflow) should be right for this use case.
You can create a DAG with a daily schedule_interval that starts at 9 am. Use a PubsubSensor per event (s1 and s2). Assuming the process to generate another dataset is an operator, you could then ensure that generate_dataset occurs by setting dependencies:
s1 >> generate_dataset
s2 >> generate_dataset

Stream de-duplication on Dataflow | Running services on Dataflow services

I want to de-dupe a stream of data based on an ID in a windowed fashion. The stream we receive has and we want to remove data with matching within N-hour time windows. A straight-forward approach is to use an external key-store (BigTable or something similar) where we look-up for keys and write if required but our qps is extremely large making maintaining such a service pretty hard. The alternative approach I came up with was to groupBy within a timewindow so that all data for a user within a time-window falls within the same group and then, in each group, we use a separate key-store service where we look up for duplicates by the key. So, I have a few questions about this approach
[1] If I run a groupBy transform, is there any guarantee that each group will be processed in the same slave? If guaranteed, we can group by the userid and then within each group compare the sessionid for each user
[2] If it is feasible, my next question is to whether we can run such other services in each of the slave machines that run the job - in the example above, I would like to have a local Redis running which can then be used by each group to look up or write an ID too.
The idea seems off what Dataflow is supposed to do but I believe such use cases should be common - so if there is a better model to approach this problem, I am looking forward to that too. We essentially want to avoid external lookups as much as possible given the amount of data we have.
1) In the Dataflow model, there is no guarantee that the same machine will see all the groups across windows for the key. Imagine that a VM dies or new VMs are added and work is split across them for scaling.
2) Your welcome to run other services on the Dataflow VMs since they are general purpose but note that you will have to contend with resource requirements of the other applications on the host potentially causing out of memory issues.
Note that you may want to take a look at RemoveDuplicates and use that if it fits your usecase.
It also seems like you might want to be using session windows to dedupe elements. You would call:
PCollection<T> pc = ...;
PCollection<T> windowed_pc = pc.apply(
Window<T>into(Sessions.withGapDuration(Duration.standardMinutes(N hours))));
Each new element will keep extending the length of the window so it won't close until the gap closes. If you also apply an AfterCount speculative trigger of 1 with an AfterWatermark trigger on a downstream GroupByKey. The trigger would fire as soon as it could which would be once it has seen at least one element and then once more when the session closes. After the GroupByKey you would have a DoFn that filters out an element which isn't an early firing based upon the pane information ([3], [4]).
DoFn(T -> KV<session key, T>)
|
\|/
Window.into(Session window)
|
\|/
Group by key
|
\|/
DoFn(Filter based upon pane information)
It is sort of unclear from your description, can you provide more details?
Sorry for not being clear. I gave the setup you mentioned a try, except for the early and late firings part, and it is working on smaller samples. I have a couple of follow up questions, related to scaling this up. Also, I was hoping I could give you more information on what the exact scenario is.
So, we have incoming data stream, each item of which can be uniquely identified by their fields. We also know that duplicates occur pretty far apart and for now, we care about those within a 6 hour window. And regarding the volume of data, we have atleast 100K events every second, which span across a million different users - so within this 6 hour window, we could get a few billion events into the pipeline.
Given this background, my questions are
[1] For the sessioning to happen by key, I should run it on something like
PCollection<KV<key, T>> windowed_pc = pc.apply(
Window<KV<key,T>>into(Sessions.withGapDuration(Duration.standardMinutes(6 hours))));
where key is a combination of the 3 ids I had mentioned earlier. Based on the definition of Sessions, only if I run it on this KV would I be able to manage sessions per-key. This would mean that Dataflow would have too many open sessions at any given time waiting for them to close and I was worried if it would scale or I would run into any bottle-necks.
[2] Once I perform Sessioning as above, I have already removed the duplicates based on the firings since I will only care about the first firing in each session which already destroys duplicates. I no longer need the RemoveDuplicates transform which I found was a combination of (WithKeys, Combine.PerKey, Values) transforms in order, essentially performing the same operation. Is this the right assumption to make?
[3] If the solution in [1] going to be a problem, the alternative is to reduce the key for sessioning to be just user-id, session-id ignoring the sequence-id and then, running a RemoveDuplicates on top of each resulting window by sequence-id. This might reduce the number of open sessions but still would leave a lot of open sessions (#users * #sessions per user) which can easily run into millions. FWIW, I dont think we can session only by user-id since then the session might never close as different sessions for same user could keep coming in and also determining the session gap in this scenario becomes infeasible.
Hope my problem is a little more clear this time. Please let me know any of my approaches make the best use of Dataflow or if I am missing something.
Thanks
I tried out this solution at a larger scale and as long as I provide sufficient workers and disks, the pipeline scales well although I am seeing a different problem now.
After this sessionization, I run a Combine.perKey on the key and then perform a ParDo which looks into c.pane().getTiming() and only rejects anything other than an EARLY firing. I tried counting both EARLY and ONTIME firings in this ParDo and it looks like the ontime-panes are actually deduped more precisely than the early ones. I mean, the #early-firings still has some duplicates whereas the #ontime-firings is less than that and has more duplicates removed. Is there any reason this could happen? Also, is my approach towards deduping using a Combine+ParDo the right one or could I do something better?
events.apply(
WithKeys.<String, EventInfo>of(new SerializableFunction<EventInfo, String>() {
#Override
public java.lang.String apply(EventInfo input) {
return input.getUniqueKey();
}
})
)
.apply(
Window.named("sessioner").<KV<String, EventInfo>>into(
Sessions.withGapDuration(mSessionGap)
)
.triggering(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()
);

Neo4j and concurrent writes, how to handle deadlock issues, preferably server-side?

I'm running into issues with deadlocks during concurrent merge operations (REST API). I have a function that processes text with some metadata, and for each item in the metadata dictionary, I'm performing a merge to add either a node or connect the text node with the metadata[n] node. Issues come up when the message rate is around 500-1000 per second.
In this particular function, there are 11 merges between 6 queries, which go something like this:
q1 = "MERGE (n:N { id: {id} }) ON CREATE SET ... ON MATCH SET "
"WITH . MERGE (...) "
"WITH ., . MERGE (.)-[:some_rel]->(.)"
params = {'the': 'params'}
cypher.execute(q1, params)
if some_condition:
q2 = "MATCH (n:N { id: {id} }) ... "
"WITH n, . MERGE (n)-[:some_rel]->(.)"
params = {'more': 'params'}
cypher.execute(q2, params)
if some_condition2:
q3
...
if some_condition_n:
qn
I'm running the above with Python, via Celery (for those not familiar with Celery, it's a distributed task queue). When the issue first came up, I was executing the above in a single transaction, and had a ton of failures due to deadlock exceptions. My initial thought was simply to implement a distributed blocking lock at the function level with Redis. This, however, causes a bottleneck in my app.
Next, I switched from a single Cypher transaction a few atomic transactions as in the above and removed the lock. This takes care of the bottleneck, as well as greatly reducing the number of deadlock exceptions, but they're still occurring, albeit at the reduced level.
Graph databases aren't really my thing, so I don't have a ton of experience with the in's and out's of Neo4j and Cypher. I have a secondary-index in Redis of the uuid's of existing nodes, so there is a pre-processing step prior to the merge's to try and keep the graph access down. Are there any obvious solutions that I should try? Maybe there's some way to queue the operations on the graph-side, or maybe I'm overlooking some server optimizations? Any advice on where to look would be appreciated. Thanks!
Okay, after thinking about this some more, I realized that the way my queries were executed was inefficient and could do with some refactoring. Since all the queries are within the same general context, there is no reason to execute them all individually, or even no reason to open a transaction and have them executed that way.
Instead, I changed function to go through the conditionals, and concatenate the query strings into one long string, and add the params that I need to the param dictionary. So, now, there's only one execution at the end, with one statement. This also takes out some of the 'MATCH' statements.
This fix doesn't wholly fix the issue, though, as there are still some deadlock exceptions being thrown.
I think I found the issue, mainly that there wasn't an issue to begin with. That is:
The Enterprise version of Neo4j has an alternative lock manager than
the Community version, meant to provide scalable locking on
high-CPU-count machines.
The Enterprise Lock Manager uses a deadlock detection algorithm that
does not require (much) synchronization, which gives it some very
desirable scalability attributes. The drawback is that it may
sometimes detect false-positives. This normally does not happen in
production usage, but becomes evident in stress testing individual
operations. These scenarios see much lower churn in CPU cache
invalidation, which the enterprise lock manager needs to communicate
across cores.
As a deadlock detection error is a safe-to-retry error and the user is
expected to handle these in all application code, since there may be
legitimate deadlocks at any time, this behavior is actually by design
to gain scalability.
I simply caught the exception and then retry after a few seconds and now:

Resources