Using SimpleDB NextToken when records in query are updated - amazon-simpledb

I have a case where we are doing a select on a domain like:
select * from mydomain where some_val = 'foo' and some_date < '2012-03-01T00:00+01:00'
When iterating the results of this query - we are doing some work and then updating the row and setting the field some_date to the current date/time. Marking that it's processed.
The question I have is will the nexttoken request break when it returns to simpledb to get the next set of records? When it returns to get the next batch - all of the ones in the first batch will now have some_date with a value that no longer is within the original query range.
I don't know how the next-token is implemented to know whether its just a pointer to the next item or whether it somehow is an offset that might "skip" a whole batch of records.
So if we retrieved 3 records at a time and I had this in my domain:
record 1, '2012-01-12T19:20+01:00'
record 2, '2012-02-14T19:20+01:00'
record 3, '2012-01-22T19:20+01:00'
record 4, '2012-01-21T19:20+01:00'
record 5, '2012-02-22T19:20+01:00'
record 6, '2012-01-20T19:20+01:00'
record 7, '2012-01-18T19:20+01:00'
record 8, '2012-01-17T19:20+01:00'
record 9, '2012-02-12T19:20+01:00'
My first execution I would get: record 1, 2, 3
If i set their some_date field to: '2012-03-12T19:20+01:00' before returning for the next-token batch - would the next-token request then return 4,5,6? Or would it return 7,8,9 (because the token was set to start at the 4th record and now 1,2,3 are no longer in the result set).
If it is important - we are using the boto library (python).

would the next-token request then return 4,5,6? Or would it return
7,8,9 [...]?
Good question, this can indeed be a bit confusing - still anything but the former (i.e. 4,5,6) wouldn't make sense for practical usage and Amazon SimpleDB works like so accordingly, see Select:
Operations that run longer than 5 seconds return a time-out error
response or a partial or empty result set. Partial and empty result
sets contain a NextToken value, which allows you to continue the
operation from where it left off [emphasis mine]
Please take note of the additional note in section Request Parameters though, which might be a bit surprising eventually:
Note
The response to a Select operation with ConsistentRead set to
true returns a consistent read. However, for any following Select
operation requests that include a NextToken value, Amazon SimpleDB
ignores the ConsistentRead field, and the subsequent results are
eventually consistent. [emphasis mine]

Related

How to make transactions running a particular query execute sequentially in Neo4j?

I'm working on an app where users can un-bookmark a post they've bookmarked before. But I realized that if multiple requests is sent by a particular user to un-bookmark a post they've bookmarked before, node properties get set multiple times. For example, if user 1 bookmarked post 1, their noOfBookmarks (both user and post) will increase by 1 and when they un-bookmark, their noOfBookmarks will decrease by 1. But sometimes during concurrent requests, I get incorrect or negative noOfBookmarks depending on the number of requests. I'm using MATCH which will return 0 rows when the pattern can't be found.
I think the problem is because of the isolation level neo4j is using. During concurrent requests, the changes made by the first query to run will not be visible to other transactions until the first transaction is committed. So the MATCH is still returning rows, that's why I'm getting invalid properties. I think what I need is for transactions to be executed sequentially or get an exclusive read lock.
I've tried setting a property on the user and post node (before MATCHing the bookmark relationship) which will make the first transaction get a write lock on those nodes. I thought other transactions will wait at this point for the write lock to be released before continuing but it didn't work.
How do I ensure the first transaction during concurrent requests modify the graph and other transactions stop at that MATCH (which is the behaviour during sequential requests)?
This is my cypher query:
MATCH (user:User { id: $userId })
MATCH (post:Post { id: $postId })
WITH user, post
MATCH (user)-[bookmarkRel:BOOKMARKED_POST]->(post)
WITH bookmarkRel, user, post
DELETE bookmarkRel
WITH post, user
SET post.noOfBookmarks = post.noOfBookmarks - 1,
user.noOfBookmarks = user.noOfBookmarks - 1
RETURN post { .* }
Thank you

First timestamp of max value in Prometheus graph

I am trying to find the timestamp of the first occurrence of max value(the cursor point in below image)
I wrote query
min(timestamp(max(jmeter_count_total{label="GET - Company Updates - ua_users_company-updates"})))
But it's returning the max timestamp of the max value
I am not able to grab the value highlighted by cursor in below image(minimum value). Instead I am getting highest value when I use above query.
I've played with this for a bit and I think this may work (take it with a grain of salt, due to limited testing).
TLDR - the query (using only foo for brevity):
min_over_time((timestamp(foo) and (foo == scalar(max_over_time(foo[2h]))))[1h:])
This portion of the query:
foo == scalar(max_over_time(foo[2h]))
returns only values where foo matches the max value of foo in the last 2h interval. For retrieving the timestamp of those cases we use the timestamp function and use these previous clause as a conditional:
timestamp(foo) and (foo == scalar(max_over_time(foo[2h])))
Finally we only want to get the first/lowest timestamp value over the time window, which is what the outer min_over_time with the nested subquery should do.
I fiddled with the online Prometheus demo using one of the metrics present there. You can check the queries here.

How to find document with SOLR query and exact string match

Considering a simple table:
CREATE TABLE transactions (
enterprise_id uuid,
transaction_id text,
state text,
PRIMARY KEY ((enterprise_id, transaction_id))
and Solr core with default, auto-generated parameters.
How do I construct a Solr query that will find me record(s) in this table that have state value exact match to an input, considering the state can be arbitrary string?
I tried this with state value of a+b. This works fine with q=state:"a+b", but that creates a "phrase query":
"rawquerystring": "state:\"a+b\"",
"querystring": "state:\"a+b\"",
"parsedquery": "PhraseQuery(state:\"a b\")",
"parsedquery_toString": "state:\"a b\"",
So, the same record is found if I use query like q=state:"a(b", which results into the same phrased query and finds the record with state of a+b. That is unacceptable to me, because I need an exact match.
I went through https://cwiki.apache.org/confluence/display/solr/Other+Parsers, and tried using q={!term f=state}a+b or q={!raw f=state}a+b, but neither even finds my sample transaction record.
Probably you got state generated as a TextField where standard tokenization is applied StandardTokenizer and then a split is made on + and the plus sign itself is discarded. You could use a different tokenizer (whitespace?) or just make state an StrField for an exact match.
This works for me with state as an StrField:
select * from transactions where solr_query='state:a+b';

breezejs: inlineCount when using FetchStrategy.FromLocalCache

Would it make sense to add inlineCount to the resultset when the data comes from the cache instead of the server ?
I store the count locally, so as long as don't leave the current url (I use angularjs), I can get it from a variable in my controller. But once I've left that url, if I go back to it, the data will still come from the cache, but my local variable is reset to initial value.
Update 12 March 2015
The requisite changes have been committed and should make the next release (after 1.5.3).
Of course the inlineCount is only available in async query execution.
The synchronous em.executeQueryLocally(query) returns immediately with the paged result array; there's no place to hold the inlineCount.
Here's an excerpt from the new, corresponding test in DocCode.queryTests: "can page queried customers in cache with inline count"
var query = EntityQuery.from('Customers')
.where('CompanyName', 'startsWith', 'A')
.orderBy('CompanyName')
.skip(2).take(2)
.inlineCount()
.using(breeze.FetchStrategy.FromLocalCache);
return em.executeQuery(query)
.then(localQuerySucceeded);
function localQuerySucceeded(data) {
var custs = data.results;
var count = custs.length;
equal(count, 2,
"have full page of cached 'A' customers now; count = " + count);
var inlineCount = data.inlineCount;
ok(inlineCount && inlineCount > 2,
'have inlineCount=' + inlineCount + ' which is greater than page size');
}
Update 9 May 2014
After more reflection, I've decided that you all are right and I have been wrong. I have entered feature request #2267 in our internal tracking system. No promise as to when we'll get it done (soon I trust); stay on us.
Original
I don't think it makes sense to support inlineCount for cache queries because Breeze can't calculate that value.
Walk this through with me. You issue a paged query to the server with the inline count and a page size of 5 items.
// Get the first 5 Products beginning with 'C'
// and also get the total of all products beginning with 'C'
var query = EntityQuery.from("Products")
.where("ProductName", "startsWith", "C")
.take(5)
.inlineCount()
.using(manager);
Suppose you run this once and the server reports that there are 142 'C' customers in the database.
Ok now take that same query and execute it locally:
query.using(FetchStrategy.FromLocalCache).execute().then(...).fail(...);
How is Breeze supposed to know the count? It only has 5 entities in cache. How can it know that there are 142 'C' customers in the database? It can't.
I think you're best option is to check the returned data object to see if it has an inlineCount value. Only reset your bound copy of the count if the query went remote.

How do you set the value of a newly added ActiveRecord counter cache?

I have a model object which did not have a counter cache on it before and I added it via a migration. The thing is, I tried and failed to set the starting value of the counter cache based on the number of child objects I already had in the migration. Any attempt to update the cache value did not get written to the database. I even tried to do it from the console but it was never going to happen. Any attempt to write directly to that value on the parent was ignored.
Changing the number of children updated the counter cache (as it should), and removing the ":counter_cache => true" from the child would let me update the value on the parent. But that's cheating. I needed to be able to add the counter cache and then set its starting value to the number of children in the migration so I could then start with correct values for pages which would show it.
What's the correct way to do that so that ActiveRecord doesn't override me?
You want to use the update_counters method, this blog post has more details:
josh.the-owens.com add a counter cache to an existing db-table
This RailsCasts on the topic is also a good resource:
http://railscasts.com/episodes/23-counter-cache-column
The canonical way is to use reset_counter_cache, i.e.:
Author.find_each do |author|
Author.reset_counter_cache(author.id, :books)
end
...and that's how you should do it if those tables are of modest size, i. e. <= 1,000,000 rows.
BUT: for anything large this will take on the order of days, because it requires two queries for each row, and fully instantiates a model etc.
Here's a way to do it about 5 orders of magnitude faster:
Author
.joins(:books)
.select("authors.id, authors.books_count, count(books.id) as count")
.group("authors.id")
.having("authors.books_count != count(books.id)")
.pluck(:id, :books_count, "count(books.id)")
.each_with_index do |(author_id, old_count, fixed_count), index|
puts "at index %7i: fixed author id %7i, new books_count %4i, previous count %4i" % [index, author_id, fixed_count, old_count] if index % 1000 == 0
Author.update_counters(author_id, books_count: fixed_count - old_count)
end
It's also possible to do it directly in SQL using just a single query, but the above worked well enough for me. Note the somewhat convoluted way it uses the difference of the previous count to the correct one: this is necessary because update_counters doesn't allow setting an absolute value, but only to increase/decrease it. The column is otherwise marked readonly.

Resources