ADODB Record set- How to fetch Record Asynchronously in between without loading all records into heap memory - ado

I am using ADODB C++ for Execute Query like Select * from table. This table having the large data around 4GB. So when i Execute Query then it is fetching all the records and increase heap memory and load all the records on the client machine virtaul memory.
this is my code
// **Create Record set Object**
ADODB::_RecordsetPtr m_pRecordSetData= NULL;
HRESULT hr = m_pRecordSetData.CreateInstance(__uuidof(ADODB::Recordset));
m_pRecordSetData->CursorType = ADODB::adOpenStatic;
m_pRecordSetData->CursorLocation = ADODB::adUseClient;
m_pRecordSetData->LockType = ADODB::adLockReadOnly;
**// Create Command Object**
ADODB::_CommandPtr pCmdPtr = NULL;
HRESULT hr = pCmdPtr.CreateInstance(__uuidof(ADODB::Command));
if (FAILED(hr))
{
return pCmdPtr;
}
pCmdPtr->PutCommandType(ADODB::adCmdText);
pCmdPtr->PutCommandTimeout(0);
pCmdPtr->ActiveConnection = m_pConnection;
pCmdPtr->CommandText = L"Select * from Table";
**// Execute Query Select * from table**
_variant_t vtConn;
vtConn.vt = VT_ERROR;
vtConn.scode = DISP_E_PARAMNOTFOUND;
HRESULT hr = m_pRecordSetData->Open((_variant_t((IDispatch *)pCmdPtr)), vtConn,
ADODB::adOpenStatic, ADODB::adLockReadOnly, -1);
So When We execute last line Recordset Open this will block until all the record is fetched. So it consume all the heap memory of the system.
Expectation / Requirement : Can we fetch the record batch wise / few rows only? , So we can allocated that heap memory and release it. So this will not consume all client side heap memory. Can we fetch asynchronously we will get the next record in batch wise? So we don't want to fetch all record in one go.

Related

Limit of 1024 stream entries in the handler in DolphinDB subscription?

n=1000000
tmpTrades = table(n:0, colNames, colTypes)
lastMinute = [00:00:00.000]
colNames = `time`sym`vwap
colTypes = [MINUTE,SYMBOL,DOUBLE]
enableTableShareAndPersistence(table=streamTable(n:0, colNames, colTypes), tableName="vwap_stream")
go
def calcVwap(mutable vwap, mutable tmpTrades, mutable lastMinute, msg){
tmpTrades.append!(msg)
curMinute = time(msg.time.last().minute()*60000l)
t = select wavg(price, qty) as vwap from tmpTrades where time < curMinute, time >= lastMinute[0] group by time.minute(), sym
if(t.size() == 0) return
vwap.append!(t)
t = select * from tmpTrades where time >= curMinute
tmpTrades.clear!()
lastMinute[0] = curMinute
if(t.size() > 0) tmpTrades.append!(t)
}
subscribeTable(tableName="trades_stream", actionName="vwap", offset=-1, handler=calcVwap{vwap_stream, tmpTrades, lastMinute}, msgAsTable=true)
This is what I wrote to subscribe to the stream. Even though the data ingested to the publishing table vwap_stream is 5000 records per batch, the maximum number ingested to the handler is still 1024. Is there a limit to the subscription?
You can modify the configuration parameter maxMsgNumPerBlock to specify the maximum number of records in a message block and the default value is 1024.
For a standalone mode the configuration file is dolphindb.cfg, and for a clustered mode is cluster.cfg.

What is the syntax for construct array of array using placement new?

i have trying to construct array of array for placement new.
i searching internet only manage to found construct an array using placement new. But what if i want array of array instead?
i not sure how to construct the inner array.
memory manager constructor already allocate buffer with large size
memory manager destructor have delete buff
Node operator new overload already implement
This my code
map_size_x = terrain->get_map_width();
map_size_y = terrain->get_map_height();
grid_map = new Node *[map_size_x];
for (int i = 0; i < map_size_x; ++i)
{
//grid_map[i] = new Node[map_size_y];
grid_map[i] = new( buf + i * sizeof(Node)) Node;
}
buf is char * already allocated big size of memory in somewhere other class like memory manager and should be enough to fit in sizeof Node * width and height.
There is new operator overload implemented in Node class which is
void *AStarPather::Node::operator new(std::size_t size, void* buffer)
{
return buffer;
}
the result seem failed to allocate and program stuck, but no crash.
i am using visual studio 2017

Neo4j : Difference between cypher execution and Java API call?

Neo4j : Enterprise version 3.2
I see a tremendous difference between the following two calls in terms for speed. Here are the settings and query/API.
Page Cache : 16g | Heap : 16g
Number of row/nodes -> 600K
cypher code (ignore syntax if any) | Time Taken : 50 sec.
using periodic commit 10000
load with headers from 'file:///xyx.csv' as row with row
create(n:ObjectTension) set n = row
From Java (session pool, with 15 session at time as an example):
Thread_1 : Time Taken : 8 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
Thread_2 : Time taken is 9 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
.
.
.
Thread_3 : Basically the above code is reused..It's just an example.
Thread_N where N = (600K / 10K)
Hence, the over all time taken is around 2 ~ 3 mins.
The question are the following?
How does CSV load handles internally? Like does it open single session and multiple transactions within?
Or
Create multiple session based on the parameter passed as "Using periodic commit 10000", with this 600K/10000 is 60 session? etc
What's the best way to write via Java?
The idea is achieve the same write performance as CSV load via Java. As the csv load 12000 nodes in ~5 seconds or even better.
Your Java code is doing something very different than your Cypher code, so it really makes no sense to compare processing times.
You should change your Java code to read from the same CSV file. File IO is fairly expensive, but your Java code is not doing any.
Also, whereas your pure Cypher query is creating nodes with a fixed (and presumably relatively small) number of properties, your Java pList is growing in size with every loop iteration -- so that each Java loop creates nodes with between 1 to 10K properties! This may be the main reason why your Java code is much slower.
[UPDATE 1]
If you want to ignore the performance difference between using and not using a CSV file, the following (untested) code should give you an idea of what similar logic would look like in Java. In this example, the i loop assumes that your CSV file has 10 columns (you should adjust the loop to use the correct column count). Also, this example gives all the nodes the same properties, which is OK as long as you have not created a contrary uniqueness constraint.
Session session = Driver.session();
Map<String,Object> pList = new HashMap<String, Object>();
for (int i = 0; i < 10; i++) {
pList.put(i, i * i);
}
Map<String, Map> params = new HashMap<String, Map>();
params.put("props", pList);
String query = "create(n:Label) set n = {props})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
for(int k = 0; k < 10000; k++){
tx.run(query, params);
}
}
}
[UPDATE 2 and 3, copied from chat and then fixed]
Since the Cypher planner is able to optimize, the actual internal logic is probably a lot more efficient than the Java code I provided (above). If you want to also optimize your Java code (which may be closer to the code that Cypher actually generates), try the following (untested) code. It sends 10000 rows of data in a single run() call, and uses the UNWIND clause to break it up into individual rows on the server.
Session session = Driver.session();
Map<String, Integer> pList = new HashMap<String, Integer>();
for (int i = 0; i < 10; i++) {
pList.put(Integer.toString(i), i*i);
}
List<Map<String,Integer>> rows = Collections.nCopies(1, pList);
Map<String, List> params = new HashMap<String, List>();
params.put("rows", rows);
String query = "UNWIND {rows} AS row CREATE(n:Label) SET n = {row})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
tx.run(query, params);
}
}
You can try are creating the nodes using Java API, instead of relying on Cypher:
createNode - http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/GraphDatabaseService.html#createNode-org.neo4j.graphdb.Label...-
setProperty - http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/PropertyContainer.html#setProperty-java.lang.String-java.lang.Object-
Also, as predecessor had mentioned, props variable has different values for your cases.
Additionally, notice that every iteration you are performing query parsing (String query = "Create(n:Label {props})";) - unless it is optimized out by neo4j itself.

Single array in the hdf5 file

Image of my dataset:
I am using the HDF5DotNet with C# and I can read only full data as the attached image in the dataset. The hdf5 file is too big, up to nearly 10GB, and if I load the whole array into the memory then it will be out of memory.
I would like to read all data from rows 5 and 7 in the attached image. Is that anyway to read only these 2 rows data into memory in a time without having to load all data into memory first?
private static void OpenH5File()
{
var h5FileId = H5F.open(#"D:\Sandbox\Flood Modeller\Document\xmdf_results\FMA_T1_10ft_001.xmdf", H5F.OpenMode.ACC_RDONLY);
string dataSetName = "/FMA_T1_10ft_001/Temporal/Depth/Values";
var dataset = H5D.open(h5FileId, dataSetName);
var space = H5D.getSpace(dataset);
var dataType = H5D.getType(dataset);
long[] offset = new long[2];
long[] count = new long[2];
long[] stride = new long[2];
long[] block = new long[2];
offset[0] = 1; // start at row 5
offset[1] = 2; // start at column 0
count[0] = 2; // read 2 rows
count[0] = 165701; // read all columns
stride[0] = 0; // don't skip anything
stride[1] = 0;
block[0] = 1; // blocks are single elements
block[1] = 1;
// Dataspace associated with the dataset in the file
// Select a hyperslab from the file dataspace
H5S.selectHyperslab(space, H5S.SelectOperator.SET, offset, count, block);
// Dimensions of the file dataspace
var dims = H5S.getSimpleExtentDims(space);
// We also need a memory dataspace which is the same size as the file dataspace
var memspace = H5S.create_simple(2, dims);
double[,] dataArray = new double[1, dims[1]]; // just get one array
var wrapArray = new H5Array<double>(dataArray);
// Now we can read the hyperslab
H5D.read(dataset, dataType, memspace, space,
new H5PropertyListId(H5P.Template.DEFAULT), wrapArray);
}
You need to select a hyperslab which has the correct offset, count, stride, and block for the subset of the dataset that you wish to read. These are all arrays which have the same number of dimensions as your dataset.
The block is the size of the element block in each dimension to read, i.e. 1 is a single element.
The offset is the number of blocks from the start of the dataset to start reading, and count is the number of blocks to read.
You can select non-contiguous regions by using stride, which again counts in blocks.
I'm afraid I don't know C#, so the following is in C. In your example, you would have:
hsize_t offset[2], count[2], stride[2], block[2];
offset[0] = 5; // start at row 5
offset[1] = 0; // start at column 0
count[0] = 2; // read 2 rows
count[1] = 165702; // read all columns
stride[0] = 1; // don't skip anything
stride[1] = 1;
block[0] = 1; // blocks are single elements
block[1] = 1;
// This assumes you already have an open dataspace with ID dataspace_id
H5Sselect_hyperslab(dataspace_id, H5S_SELECT_SET, offset, stride, count, block)
You can find more information on reading/writing hyperslabs in the HDF5 tutorial.
It seems there are two forms of H5D.read in C#, you want the second form:
H5D.read(Type) Method (H5DataSetId, H5DataTypeId, H5DataSpaceId,
H5DataSpaceId, H5PropertyListId, H5Array(Type))
This allows you specify the memory and file dataspaces. Essentially, you need one dataspace which has information about the size, stride, offset, etc. of the variable in memory that you want to read into; and one dataspace for the dataset in the file that you want to read from. This lets you do things like read from a non-contiguous region in a file to a contiguous region in an array in memory.
You want something like
// Dataspace associated with the dataset in the file
var dataspace = H5D.get_space(dataset);
// Select a hyperslab from the file dataspace
H5S.selectHyperslab(dataspace, H5S.SelectOperator.SET, offset, count);
// Dimensions of the file dataspace
var dims = H5S.getSimpleExtentDims(dataspace);
// We also need a memory dataspace which is the same size as the file dataspace
var memspace = H5S.create_simple(rank, dims);
// Now we can read the hyperslab
H5D.read(dataset, datatype, memspace, dataspace,
new H5PropertyListId(H5P.Template.DEFAULT), wrapArray);
From your posted code, I think I've spotted the problem. First you do this:
var space = H5D.getSpace(dataset);
then you do
var dataspace = H5D.getSpace(dataset);
These two calls do the same thing, but create two different variables
You call H5S.selectHyperslab with space, but H5D.read uses dataspace.
You need to make sure you are using the correct variables consistently. If you remove the second call to H5D.getSpace, and change dataspace -> space, it should work.
Maybe you want to have a look at HDFql as it abstracts yourself from the low-level details of HDF5. Using HDFql in C#, you can read rows #5 and #7 of dataset Values using a hyperslab selection like this:
float [,]data = new float[2, 165702];
HDFql.Execute("SELECT FROM Values(5:2:2:1) INTO MEMORY " + HDFql.VariableTransientRegister(data));
Afterwards, you can access these rows through variable data. Example:
for(int x = 0; x < 2; x++)
{
for(int y = 0; y < 165702; y++)
{
System.Console.WriteLine(data[x, y]);
}
}

LINQ, Skip and Take against Azure SQL Databse Not Working

I'm pulling a paged dataset on an ASP.NET MVC3 application which uses JQuery to get data for endlesss scroll paging via $ajax call. The backend is a Azure SQL database. Here is the code:
[Authorize]
[OutputCache(Duration=0,NoStore=true)]
public PartialViewResult Search(int page = 1)
{
int batch = 10;
int fromRecord = 1;
int toRecord = batch;
if (page != 1)
{
//note these are correctly passed and set
toRecord = (batch * page);
fromRecord = (toRecord - (batch - 1));
}
IQueryable<TheTable> query;
query = context.TheTable.Where(m => m.Username==HttpContext.User.Identity.Name)
.OrderByDescending(d => d.CreatedOn)
.Skip(fromRecord).Take(toRecord);
//this should always be the batch size (10)
//but seems to concatenate the previous results ???
int count = query.ToList().Count();
//results
//call #1, count = 10
//call #2, count = 20
//call #3, count = 30
//etc...
PartialViewResult newPartialView = PartialView("Dashboard/_DataList", query.ToList());
return newPartialView;
}
The data returned from each call from Jquery $ajax continues to GROW on each subsequent call rather then returning only 10 records per call. So the results return contain all of the earlier calls data as well. Also, I've added 'cache=false' to the $ajax call as well. Any ideas on what is going wrong here?
The values you're passing to Skip and Take are wrong.
The argument to Skip should be the number of records you want to skip, which should be 0 on the first page;
The argument to Take needs to be the number of records you want to return, which will always be equal to batch;
Your code needs to be:
int batch = 10;
int fromRecord = 0;
int toRecord = batch;
if (page != 1)
{
fromRecord = batch * (page - 1);
}

Resources