AWS SimpleDB - how to create an offset to address default result limit of 100 - amazon-simpledb

AWS SimpleDB has a default result limit of 100 and I am trying to get the last 20 records from my database. Their online help states: "The next token returned by count(*) and select are interchangeable as long as the where and order by clauses match. For example, if you want to return the 200 items after the first 10,000 (similar to an offset), you can perform a count with a limit clause of 10,000 and use the next token to return the next 200 items with select."
So I'm trying to work out how to perform these two operations. I start by getting a count of number of records (about 160 in my example), which I then subtract 20 from to create the offset. But AWS is still returning the first 100 records. I presume the code between "//++" isn't correct in creating the offset.
//++ AWS has display limit of 100 so create offset by starting from count - 20
NSString *sdbAppOffset = [NSString stringWithFormat:#"select count(*) from %# limit %i",sdbAppUse,iDbaseRecordCountOffset];
SimpleDBSelectRequest *selectOffsetRequest = [[SimpleDBSelectRequest alloc] initWithSelectExpression:sdbAppOffset];
selectOffsetRequest.consistentRead = YES;
SimpleDBSelectResponse *selectOffSetResponse = [sdbClient select:selectOffsetRequest];
//NSLog(#"sdbAppUse count %i",[selectOffSetResponse.items count]);
//++
NSString *sdbAppUseString = [NSString stringWithFormat:#"select * from %#",sdbAppUse];
SimpleDBSelectRequest *selectRequest = [[SimpleDBSelectRequest alloc] initWithSelectExpression:sdbAppUseString];
selectRequest.consistentRead = YES;
SimpleDBSelectResponse *selectResponse = [sdbClient select:selectRequest];
NSLog(#"sdbAppUse count %i",[selectResponse.items count]);
for (int x = 0; x < [selectResponse.items count]; x++) {
SimpleDBItem *registerUser = [selectResponse.items objectAtIndex:x];
for (SimpleDBAttribute *attribute in registerUser.attributes) {
NSLog(#"trackUsage registerUser |%#|, name |%#|, value |%#|",registerUser.name, attribute.name, attribute.value);
}
}

Related

Limit of 1024 stream entries in the handler in DolphinDB subscription?

n=1000000
tmpTrades = table(n:0, colNames, colTypes)
lastMinute = [00:00:00.000]
colNames = `time`sym`vwap
colTypes = [MINUTE,SYMBOL,DOUBLE]
enableTableShareAndPersistence(table=streamTable(n:0, colNames, colTypes), tableName="vwap_stream")
go
def calcVwap(mutable vwap, mutable tmpTrades, mutable lastMinute, msg){
tmpTrades.append!(msg)
curMinute = time(msg.time.last().minute()*60000l)
t = select wavg(price, qty) as vwap from tmpTrades where time < curMinute, time >= lastMinute[0] group by time.minute(), sym
if(t.size() == 0) return
vwap.append!(t)
t = select * from tmpTrades where time >= curMinute
tmpTrades.clear!()
lastMinute[0] = curMinute
if(t.size() > 0) tmpTrades.append!(t)
}
subscribeTable(tableName="trades_stream", actionName="vwap", offset=-1, handler=calcVwap{vwap_stream, tmpTrades, lastMinute}, msgAsTable=true)
This is what I wrote to subscribe to the stream. Even though the data ingested to the publishing table vwap_stream is 5000 records per batch, the maximum number ingested to the handler is still 1024. Is there a limit to the subscription?
You can modify the configuration parameter maxMsgNumPerBlock to specify the maximum number of records in a message block and the default value is 1024.
For a standalone mode the configuration file is dolphindb.cfg, and for a clustered mode is cluster.cfg.

Neo4j : Difference between cypher execution and Java API call?

Neo4j : Enterprise version 3.2
I see a tremendous difference between the following two calls in terms for speed. Here are the settings and query/API.
Page Cache : 16g | Heap : 16g
Number of row/nodes -> 600K
cypher code (ignore syntax if any) | Time Taken : 50 sec.
using periodic commit 10000
load with headers from 'file:///xyx.csv' as row with row
create(n:ObjectTension) set n = row
From Java (session pool, with 15 session at time as an example):
Thread_1 : Time Taken : 8 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
Thread_2 : Time taken is 9 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
.
.
.
Thread_3 : Basically the above code is reused..It's just an example.
Thread_N where N = (600K / 10K)
Hence, the over all time taken is around 2 ~ 3 mins.
The question are the following?
How does CSV load handles internally? Like does it open single session and multiple transactions within?
Or
Create multiple session based on the parameter passed as "Using periodic commit 10000", with this 600K/10000 is 60 session? etc
What's the best way to write via Java?
The idea is achieve the same write performance as CSV load via Java. As the csv load 12000 nodes in ~5 seconds or even better.
Your Java code is doing something very different than your Cypher code, so it really makes no sense to compare processing times.
You should change your Java code to read from the same CSV file. File IO is fairly expensive, but your Java code is not doing any.
Also, whereas your pure Cypher query is creating nodes with a fixed (and presumably relatively small) number of properties, your Java pList is growing in size with every loop iteration -- so that each Java loop creates nodes with between 1 to 10K properties! This may be the main reason why your Java code is much slower.
[UPDATE 1]
If you want to ignore the performance difference between using and not using a CSV file, the following (untested) code should give you an idea of what similar logic would look like in Java. In this example, the i loop assumes that your CSV file has 10 columns (you should adjust the loop to use the correct column count). Also, this example gives all the nodes the same properties, which is OK as long as you have not created a contrary uniqueness constraint.
Session session = Driver.session();
Map<String,Object> pList = new HashMap<String, Object>();
for (int i = 0; i < 10; i++) {
pList.put(i, i * i);
}
Map<String, Map> params = new HashMap<String, Map>();
params.put("props", pList);
String query = "create(n:Label) set n = {props})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
for(int k = 0; k < 10000; k++){
tx.run(query, params);
}
}
}
[UPDATE 2 and 3, copied from chat and then fixed]
Since the Cypher planner is able to optimize, the actual internal logic is probably a lot more efficient than the Java code I provided (above). If you want to also optimize your Java code (which may be closer to the code that Cypher actually generates), try the following (untested) code. It sends 10000 rows of data in a single run() call, and uses the UNWIND clause to break it up into individual rows on the server.
Session session = Driver.session();
Map<String, Integer> pList = new HashMap<String, Integer>();
for (int i = 0; i < 10; i++) {
pList.put(Integer.toString(i), i*i);
}
List<Map<String,Integer>> rows = Collections.nCopies(1, pList);
Map<String, List> params = new HashMap<String, List>();
params.put("rows", rows);
String query = "UNWIND {rows} AS row CREATE(n:Label) SET n = {row})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
tx.run(query, params);
}
}
You can try are creating the nodes using Java API, instead of relying on Cypher:
createNode - http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/GraphDatabaseService.html#createNode-org.neo4j.graphdb.Label...-
setProperty - http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/PropertyContainer.html#setProperty-java.lang.String-java.lang.Object-
Also, as predecessor had mentioned, props variable has different values for your cases.
Additionally, notice that every iteration you are performing query parsing (String query = "Create(n:Label {props})";) - unless it is optimized out by neo4j itself.

I need to get more than 100 pages in my query

I want to get all video information posible from Youtube for my proyect. I know that the limit page is 100.
I do the next code:
ArrayList<String> videos = new ArrayList<>();
int i = 1;
String peticion = "http://gdata.youtube.com/feeds/api/videos?category=Comedy&alt=json&max-results=50&page=" + i;
URL oracle = new URL(peticion);
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine = in.readLine();
while (in.readLine() != null)
{
inputLine = inputLine + in.readLine();
}
System.out.println(inputLine);
JSONObject jsonObj = new JSONObject(inputLine);
JSONObject jsonFeed = jsonObj.getJSONObject("feed");
JSONArray jsonArr = jsonFeed.getJSONArray("entry");
while(i<=100)
{
for (int j = 0; j < jsonArr.length(); j++) {
videos.add(jsonArr.getJSONObject(j).getJSONObject("id").getString("$t"));
System.out.println("Numero " + videosTotales + jsonArr.getJSONObject(j).getJSONObject("id").getString("$t"));
videosTotales++;
}
i++;
}
When the program finish, I have 5000 videos per category, but I need much more, much much more, but the limit is page = 100.
So, how can I get more than 10 millions of videos?
Thank you!
Are those 5000 also unique id's ?
I see the use of max-results=50, but not a start-index parameter in your url.
There is a limit on the results you can get per request. There is also a limit on the number of requests that you can send within some time interval. By checking the statuscode of the response and any error message you can find these limits, as they may change in time.
Besides the category parameter, use some other parameters too. For instance, you may vary the q parameter (used with some keywords) and/or order parameter to get a different results set.
See the documentation for available parameters.
Note, that you are using api version 2, which is deprecated. There is an api version 3.

Query SQLite database for a GUID in the form X'3D98F71F3CD9415BA978C010b1CEF941

I have an iOS project and data is written into an SQLite Database. For example, 'OBJECTROWID' in a table LDOCLINK stores info about a linked document.
OBJECTROWID starts of as a string with the format <3d98f71f 3cd9415b a978c010 b1cef941> but is cast to (NSData *) before being input into the database. The actual handling of the database insertion was written by a much more experienced programmer than myself. Anyway, as the image below shows, the database displays the OBJECTROWID column in the form X'3D98F71F3CD9415BA978C010b1CEF941'. I am a complete beginner with SQLite queries and cannot seem to return the correct row by using the WHERE clause with OBJECTROWID = or OBJECTROWID like.
SELECT * FROM LDOCLINK WHERE OBJECTROWID like '%';
gives all the rows (obviously) but I want the row where OBJECTROWID equals <3d98f71f 3cd9415b a978c010 b1cef941>. I have tried the following and none of them work:
SELECT * FROM LDOCLINK WHERE OBJECTROWID = 'X''3d98f71f3cd9415ba978c010b1cef941' no error - I thought that I was escaping the single quote that appears after the X but this didn't work
SELECT * FROM LDOCLINK WHERE OBJECTROWID like '%<3d98f71f 3cd9415b a978c010 b1cef941>%'
I cannot even get a match for two adjacent characters such as the initial 3D:
SELECT * FROM LDOCLINK WHERE OBJECTROWID like '%3d%' no error reported but it doesn't return anything.
SELECT * FROM LDOCLINK WHERE OBJECTROWID like '%d%' This is the strangest result as it returns ONLY the two rows that DON'T include my <3d98f71f 3cd9415b a978c010 b1cef941>, seemingly arbitrarily.
SELECT * FROM LDOCLINK WHERE OBJECTTYPE = '0' returns these same rows, just to illustrate that the interface works (SQLite Manager).
I also checked out this question and this one but I still could not get the correct query.
Please help me to return the correct row (actually two rows in this case - the first and third).
EDIT:
The code to write to database involves many classes. The method shown below is I think the main part of serialisation (case 8).
-(void)serializeValue:(NSObject*)value ToBuffer:(NSMutableData*)buffer
{
switch (self.propertyTypeID) {
case 0:
{
SInt32 length = 0;
if ( (NSString*)value )
{
/*
NSData* data = [((NSString*)value) dataUsingEncoding:NSUnicodeStringEncoding];
// first 2 bytes are unicode prefix
length = data.length - 2;
[buffer appendBytes:&length length:sizeof(SInt32)];
if ( length > 0 )
[buffer appendBytes:([data bytes]+2) length:length];
*/
NSData* data = [((NSString*)value) dataUsingEncoding:NSUTF8StringEncoding];
length = data.length;
[buffer appendBytes:&length length:sizeof(SInt32)];
if ( length > 0 )
[buffer appendBytes:([data bytes]) length:length];
}
else
[buffer appendBytes:&length length:sizeof(SInt32)];
}
break;
//depends on the realisation of DB serialisation
case 1:
{
Byte b = 0;
if ( (NSNumber*)value )
b = [(NSNumber*)value boolValue] ? 1 : 0;
[buffer appendBytes:&b length:1];
}
break;
//........
case 8:
{
int length = 16;
[buffer appendBytes:[(NSData*)value bytes] length:length];
}
break;
default:
break;
}
}
So, as pointed out by Tom Kerr, this post answered my question. Almost. The syntax wasn't exactly right. The form: SELECT * FROM LDOCLINK WHERE OBJECTROWID.Id = X'a8828ddfef224d36935a1c66ae86ebb3'; was suggested but I actually had to drop the .Id part.
Making:
SELECT * FROM LDOCLINK WHERE OBJECTROWID = X'3d98f71f3cd9415ba978c010b1cef941';

LINQ, Skip and Take against Azure SQL Databse Not Working

I'm pulling a paged dataset on an ASP.NET MVC3 application which uses JQuery to get data for endlesss scroll paging via $ajax call. The backend is a Azure SQL database. Here is the code:
[Authorize]
[OutputCache(Duration=0,NoStore=true)]
public PartialViewResult Search(int page = 1)
{
int batch = 10;
int fromRecord = 1;
int toRecord = batch;
if (page != 1)
{
//note these are correctly passed and set
toRecord = (batch * page);
fromRecord = (toRecord - (batch - 1));
}
IQueryable<TheTable> query;
query = context.TheTable.Where(m => m.Username==HttpContext.User.Identity.Name)
.OrderByDescending(d => d.CreatedOn)
.Skip(fromRecord).Take(toRecord);
//this should always be the batch size (10)
//but seems to concatenate the previous results ???
int count = query.ToList().Count();
//results
//call #1, count = 10
//call #2, count = 20
//call #3, count = 30
//etc...
PartialViewResult newPartialView = PartialView("Dashboard/_DataList", query.ToList());
return newPartialView;
}
The data returned from each call from Jquery $ajax continues to GROW on each subsequent call rather then returning only 10 records per call. So the results return contain all of the earlier calls data as well. Also, I've added 'cache=false' to the $ajax call as well. Any ideas on what is going wrong here?
The values you're passing to Skip and Take are wrong.
The argument to Skip should be the number of records you want to skip, which should be 0 on the first page;
The argument to Take needs to be the number of records you want to return, which will always be equal to batch;
Your code needs to be:
int batch = 10;
int fromRecord = 0;
int toRecord = batch;
if (page != 1)
{
fromRecord = batch * (page - 1);
}

Resources