PHP variable reverts back to last assigned after intensive curl operation - memory

I'm querying one api and sending data to another. I'm also querying a mysql database. And doing all this about 40 times in one second. Then waiting a minute and repeating. I have a feeling I'm at the limit of what PHP can do.
My question is about two variables that will randomly revert back to their last value, from the previous loop. They only change their value after the call to self::apiCall() (below in the second function). Both $product and $productId will randomly change their value, about once every 40 loops or so.
I boosted PHP to 7.2, increased memory to 512, and assigned some variables to null to save memory. I'm not getting any official memory warnings, but watching the variables randomly go back to their last value is perplexing. Here's what the code looks like.
/**
* The initial create products loop which calls the secondary function where
* the variables can change.
**/
public static function createProducts() {
// Create connection
$conn = new mysqli(SERVERNAME, USERNAME, PASSWORD, DBNAME, PORT);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
// This will go through each row and echo the id column
$productResults = mysqli_query($conn, "SELECT * FROM product_creation_queue");
if(mysqli_num_rows($productResults) > 0) {
$rowIndex = 0;
while($row = mysqli_fetch_assoc($productResults)){
self::createProduct($conn, $product);
}
}
}
/**
* The second function where I see both $product and $productId changing
* from time to time, which completely breaks the code. Their values
* only change after the call to self::createProduct() which is simply a
* curl function to hit an api endpoint.
**/
public static function createProduct($mysqlConnection, $product) {
// convert back to array from json
$productArray = json_decode($product, TRUE);
// here the value of $productId is one thing
$productId = $productArray['product']['id'];
// here is the curl call
$addProduct = self::api_call(TOKEN, SHOP, ENDPOINT, $product, 'POST');
// and randomly here it can revert to it's last value in a previous loop
echo $productId;
}

The problem was that the entire 40-query procedure took more than one minute to complete. And the cron job that started the procedure on the minute would start the next one before the first one had completed, thereby somehow re-assigning variables on the fly. The queries usually took less than one minute, but when it was longer, the conflicts appeared, thus leading to the appearance of randomness.
I reduced the number of queries per minute so now the process completes in less than 60 seconds and no variables are ever overwritten. I still don't understand how the variables would change if two php processes are happening at the same time--it seems like they would be siloed.

Related

SP execution time

I commented all the body of my SP except the declare parameters part. The SP Body is something like below, Note that all other part of body is commented.
OUT PO_ERROR INTEGER,
IN PI_CURRENT_DATE INTEGER,
IN PI_USER_ID DECIMAL(15),
IN PI_BID DECIMAL(15),
IN PI_AID DECIMAL(15),
IN PI_UUID VARCHAR(36),
IN PI_XML XML,
OUT PO_VERSION INTEGER,
OUT PO_ERROR_MSG INTEGER,
OUT PO_BID DECIMAL(15),
OUT PO_STEP INTEGER
SPECIFIC ESPNAME1
RESULT SETS 1
MODIFIES SQL DATA
NOT DETERMINISTIC
NULL CALL
LANGUAGE SQL
BODY: BEGIN
DECLARE L_SQLCODE INT DEFAULT 0;
DECLARE SQLCODE INTEGER DEFAULT 0;
DECLARE L_AID INTEGER DEFAULT 0;
DECLARE L_BNO INTEGER DEFAULT 0;
DECLARE L_BID INTEGER DEFAULT 0;
DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
SET L_SQLCODE = SQLCODE;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET L_SQLCODE = SQLCODE;
SET PO_ERROR = 0;
SET PO_STEP = 0;
SET PO_ERROR_MSG = 0;
COMMIT;
END BODY
Question: I run SP with specified input parameters and every time the execution time of SP is in the range of 140ms to 180ms. I think this execution time is much for a SP without body. What is wrong here? Does this time contains get connection time either? If yes, how can I check SP execution time without get connection time?
Note that, I tried deleting PI_XML from input parameters, cause I thought maybe the XML input is increasing execution time, but nothing happened and execution time is still in that range.
It's a lot easier to measure the elapsed time of just the stored procedure part if you capture the start and end times inside the procedure itself. One way to accomplish this is to temporarily add a couple of output parameters to it, like this:
CREATE PROCEDURE ...
OUT PO_START TIMESTAMP,
OUT PO_END TIMESTAMP )
...
BODY:BEGIN
SET PO_START = CURRENT TIMESTAMP;
... -- Rest of the procedure
SET PO_END = CURRENT TIMESTAMP;
END BODY
In a do-nothing procedure such as the one you're currently testing, I'd be surprised if PO_START and PO_END differ by more than a handful of milliseconds. The rest of the elapsed time could be caused by any of the following:
Client opens a database connection and authenticates
Database was not already activated

Querying local cache first before querying server in Breeze JS

I have an app using Breeze to query the data. I want to first check the local cache and then the server cache if no results are returned (I followed John Papa's SPA jumpstart course). However, I have found a flaw in my logic which I am not sure how to fix. Assuming I have 10 items that match my query.
Situation 1 (which works): I go to list page (Page A) displaying all 10. Hits server as cache is empty and adds all 10 to the cache. Then go to page displaying 1 result (Page B) which is found in the cache. So all good.
Situation 2 (the problem): I go to the page displaying 1 record first (Page B). Then I go to my list page (Page A) which checks the cache and finds 1 record and because of this line ( if (recordsInCache.length > 0)) it exits and only shows that 1 record.
I somehow need to know that there are more records on the server (9) that are NOT in the cache, ie. the total records for this query is actually 10, I have 1 therefore I have to hit server for the other 9.
Here is my query for Page A:
function getDaresToUser(daresObservable, criteria, forceServerCall)
{
var query = EntityQuery.from('Dares')
.where('statusId', '!=', enums.dareStatus.Deleted)
.where('toUserId', '==', criteria.userId)
.expand("fromUser, toUser")
.orderBy('deadlineDate, changedDate');
return dataServiceHelper.executeQuery(query, daresObservable, false, forceServerCall);
}
and here is my query for Page B (single item)
function getDare(dareObservable, criteria, forceServerCall)
{
var query = EntityQuery.from('Dares')
.expand("fromUser, toUser")
.where('dareId', '==', criteria.dareId);
return dataServiceHelper.executeQuery(query, dareObservable, true, forceServerCall);
}
function executeQuery(query, itemsObservable, singleEntity, forceServerCall)
{
//check local cache first
if (!manager.metadataStore.isEmpty() && !forceServerCall)
{
var recordsInCache = executeLocalQuery(query, itemsObservable, singleEntity);
if (recordsInCache.length > 0)
{
callCompleted();
return Q.resolve();
}
}
return manager.executeQuery(query)
.then(querySucceeded)
.fail(queryFailed);
}
function executeLocalQuery(query, itemsObservable, singleEntity)
{
var recordsInCache = manager.executeQueryLocally(query);
if (recordsInCache.length > 0)
{
processQueryResults(recordsInCache, itemsObservable, singleEntity, true);
}
return recordsInCache;
}
Any advice appreciated...
If you want to just hit the server for comparison purposes then at some point (either when loading up your app or when you hit the list page) call inlineCount to compare total on server vs what you already have like shown in this answer stackoverflow.com/questions/16390897/counts-in-breeze-js/…
A way you can use this creatively while you are querying for the single record would be like this -
Set some variable in your view model or somewhere equal to total count
var totalCount = 0;
When you query the single record get the inline count -
var query = EntityQuery.from('Dares')
.expand("fromUser, toUser")
.where('dareId', '==', criteria.dareId)
.inlineCount(true);
and set totalCount = data.inlineCount; Same thing when you get the total items list, just set the totalCount to inlineCount then too so you always know if you have all of the entities.
I’ve been thinking about this problem more in the last year (and have since moved from Durandal + Breeze to Angular + Breeze : In Angular you can cache the service call easily using
return $resource(xyz + params}, {'query': { method:'GET', cache: true, isArray:true }}).query(successArrayDataLoad, errorDataLoad);
I guess Angular caches the params of this query and knows when it has it already. So when I switch this method to use Breeze I lose this functionality and all my List calls are hit on every time.
So the real problem here is List data. Single Entities can always check the local cache and if nothing is returned then check the server (because you expect exactly 1).
However, List data varies by params. For example, if I have a GetGames call which takes in a CreatedByUserId, every time I supply a new CreatedByUserId I have to go back to the server.
So I think what I really need to do here to cache my List calls is to cache the Key for each call which is a combination of the QueryName and the Params.
For example, GetGames1 for UserID 1 and then GetGames2 for UserId 2.
The logic would be: Check the Angular cache to see if this call has been made before in this session. If it has, then check the local cache first. If nothing is returned, check the server.
If it has not, check the server as the local cache MIGHT have some data in it for this query but it's not guaranteed to be the full set.
The only other way around it would be to hit the server each time first to get a count for that Query + Params and then hit the local cache and compare the count, but that is more inefficient.
Thoughts?

DataNucleus Memory/Cache Handling for large update/insert

We are running application in Spring context using DataNucleus as our ORM mapping and mysql as our database.
Our application have a daily import job of some data feed into our database. The size of the data feed translate into around 1 millions row of insert/update. The performance of the import start out to be very good but then it degrade overtime (as the number of executed query increase) and at some point the application freeze or stop responding. We will have to wait for the whole job to complete before the application response again.
This behavior looks very like a memory leak to us and we have been looking hard at our code to catch any potential problem, however the problem didn't go away. One interesting thing we found from the heap dump is that org.datanucleus.ExecutionContextThreadedImpl (or the HashSet/HashMap) hold 90% of our memory (5GB) during the import. (I have attahed the screenshot of the dump below). My research on the internet said this reference is the Level1 Cache (not sure am i correct). My question is during a large import, how can i limit/control the size of the level1 cache. May be ask DN to not cache during my import?
If that's not the L1 cache, what's the possible cause of my memory issue?
Our code use a transaction for every insert to prevent locking of large chunk of data in the database. It's call the flush method every 2000 insert
As a temporary fix, we moved our import process to run overnight when no one is using our app. Obviously, this cannot go on forever. Please could someone at least point us in the right direction so that we can do more research and hoping we can find a fixes.
Would be good if someone have knowledge of decoding the heap dump
Your help would be very much appreciated by all of us here. Many thanks!
https://s3-ap-southeast-1.amazonaws.com/public-external/datanucleus_heap_dump.png
https://s3-ap-southeast-1.amazonaws.com/public-external/datanucleus_dump2.png
Code Below - Caller of this method does not have a transaction. This method will process one import object per call, and we need to process around 100K of these object daily
#Override
#PreAuthorize("(hasUserRole('ROLE_ADMIN')")
#Transactional(propagation = Propagation.REQUIRED)
public void processImport(ImportInvestorAccountUpdate account, String advisorCompanyKey) {
ImportInvestorAccountDescriptor invAccDesc = account
.getInvestorAccount();
InvestorAccount invAcc = getInvestorAccountByImportDescriptor(
invAccDesc, advisorCompanyKey);
try {
ParseReportingData parseReportingData = ctx
.getBean(ParseReportingData.class);
String baseCCY = invAcc.getBaseCurrency();
Date valueDate = account.getValueDate();
ArrayList<InvestorAccountInformationILAS> infoList = parseReportingData
.getInvestorAccountInformationILAS(null, invAcc, valueDate,
baseCCY);
InvestorAccountInformationILAS info = infoList.get(0);
PositionSnapshot snapshot = new PositionSnapshot();
ArrayList<Position> posList = new ArrayList<Position>();
Double totalValueInBase = 0.0;
double totalQty = 0.0;
for (ImportPosition importPos : account.getPositions()) {
Asset asset = getAssetByImportDescriptor(importPos
.getTicker());
PositionInsurance pos = new PositionInsurance();
pos.setAsset(asset);
pos.setQuantity(importPos.getUnits());
pos.setQuantityType(Position.QUANTITY_TYPE_UNITS);
posList.add(pos);
}
snapshot.setPositions(posList);
info.setHoldings(snapshot);
log.info("persisting a new investorAccountInformation(source:"
+ invAcc.getReportSource() + ") on " + valueDate
+ " of InvestorAccount(key:" + invAcc.getKey() + ")");
persistenceService.updateManagementEntity(invAcc);
} catch (Exception e) {
throw new DataImportException(invAcc == null ? null : invAcc.getKey(), advisorCompanyKey,
e.getMessage());
}
}
Do you use the same pm for the entire job?
If so, you may want to try to close and create new ones once in a while.
If not, this could be the L2 cache. What setting do you have for datanucleus.cache.level2.type? It think it's a weak map by default. You may want to try none for testing.

How to implement pagination when using amazon Dynamo DB in rails

I want to use amazon Dynamo DB with rails.But I have not found a way to implement pagination.
I will use AWS::Record::HashModel as ORM.
This ORM supports limits like this:
People.limit(10).each {|person| ... }
But I could not figured out how to implement following MySql query in Dynamo DB.
SELECT *
FROM `People`
LIMIT 1 , 30
You issue queries using LIMIT. If the subset returned does not contain the full table, a LastEvaluatedKey value is returned. You use this value as the ExclusiveStartKey in the next query. And so on...
From the DynamoDB Developer Guide.
You can provide 'page-size' in you query to set the result set size.
The response of DynamoDB contains 'LastEvaluatedKey' which will indicate the last key as per the page size. If response does't contain 'LastEvaluatedKey' it means there are no results left to fetch.
Use the 'LastEvaluatedKey' as 'ExclusiveStartKey' while fetching next time.
I hope this helps.
DynamoDB Pagination
Here's a simple copy-paste-run proof of concept (Node.js) for stateless forward/reverse navigation with dynamodb. In summary; each response includes the navigation history, allowing user to explicitly and consistently request either the next or previous page (while next/prev params exist):
GET /accounts -> first page
GET /accounts?next=A3r0ijKJ8 -> next page
GET /accounts?prev=R4tY69kUI -> previous page
Considerations:
If your ids are large and/or users might do a lot of navigation, then the potential size of the next/prev params might become too large.
Yes you do have to store the entire reverse path - if you only store the previous page marker (per some other answers) you will only be able to go back one page.
It won't handle changing pageSize midway, consider baking pageSize into the next/prev value.
base64 encode the next/prev values, and you could also encrypt.
Scans are inefficient, while this suited my current requirement it won't suit all!
// demo.js
const mockTable = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
const getPagedItems = (pageSize = 5, cursor = {}) => {
// Parse cursor
const keys = cursor.next || cursor.prev || [] // fwd first
let key = keys[keys.length-1] || null // eg ddb's PK
// Mock query (mimic dynamodb response)
const Items = mockTable.slice(parseInt(key) || 0, pageSize+key)
const LastEvaluatedKey = Items[Items.length-1] < mockTable.length
? Items[Items.length-1] : null
// Build response
const res = {items:Items}
if (keys.length > 0) // add reverse nav keys (if any)
res.prev = keys.slice(0, keys.length-1)
if (LastEvaluatedKey) // add forward nav keys (if any)
res.next = [...keys, LastEvaluatedKey]
return res
}
// Run test ------------------------------------
const runTest = () => {
const PAGE_SIZE = 6
let x = {}, i = 0
// Page to end
while (i == 0 || x.next) {
x = getPagedItems(PAGE_SIZE, {next:x.next})
console.log(`Page ${++i}: `, x.items)
}
// Page back to start
while (x.prev) {
x = getPagedItems(PAGE_SIZE, {prev:x.prev})
console.log(`Page ${--i}: `, x.items)
}
}
runTest()
I faced a similar problem.
The generic pagination approach is, use "start index" or "start page" and the "page length". 
The "ExclusiveStartKey" and "LastEvaluatedKey" based approach is very DynamoDB specific.
I feel this DynamoDB specific implementation of pagination should be hidden from the API client/UI.
Also in case, the application is serverless, using service like Lambda, it will be not be possible to maintain the state on the server. The other side is the client implementation will become very complex.
I came with a different approach, which I think is generic ( and not specific to DynamoDB)
When the API client specifies the start index, fetch all the keys from
the table and store it into an array.
Find out the key for the start index from the array, which is
specified by the client.
Make use of the ExclusiveStartKey and fetch the number of records, as
specified in the page length.
If the start index parameter is not present, the above steps are not
needed, we don't need to specify the ExclusiveStartKey in the scan
operation.
This solution has some drawbacks -
We will need to fetch all the keys when the user needs pagination with
start index.
We will need additional memory to store the Ids and the indexes.
Additional database scan operations ( one or multiple to fetch the
keys )
But I feel this will be very easy approach for the clients, which are using our APIs. The backward scan will work seamlessly. If the user wants to see "nth" page, this will be possible.
In fact I faced the same problem and I noticed that LastEvaluatedKey and ExclusiveStartKey are not working well especially when using Scan So I solved Like this.
GET/?page_no=1&page_size=10 =====> first page
response will contain count of records and first 10 records
retry and increase number of page until all record come.
Code is below
PS: I am using python
first_index = ((page_no-1)*page_size)
second_index = (page_no*page_size)
if (second_index > len(response['Items'])):
second_index = len(response['Items'])
return {
'statusCode': 200,
'count': response['Count'],
'response': response['Items'][first_index:second_index]
}

nsIProtocolHandler: trouble loading image for html page

I'm building an nsIProtocolHandler implementation in Delphi. (more here)
And it's working already. Data the module builds gets streamed over an nsIInputStream. I've got all the nsIRequest, nsIChannel and nsIHttpChannel methods and properties working.
I've started testing and I run into something strange. I have a page "a.html" with this simple HTML:
<img src="a.png">
Both "xxm://test/a.html" and "xxm://test/a.png" work in Firefox, and give above HTML or the PNG image data.
The problem is with displaying the HTML page, the image doesn't get loaded. When I debug, I see:
NewChannel gets called for a.png, (when Firefox is processing an OnDataAvailable notice on a.html),
NotificationCallbacks is set (I only need to keep a reference, right?)
RequestHeader "Accept" is set to "image/png,image/*;q=0.8,*/*;q=0.5"
but then, the channel object is released (most probably due to a zero reference count)
Looking at other requests, I would expect some other properties to get set (such as LoadFlags or OriginalURI) and AsyncOpen to get called, from where I can start getting the request responded to.
Does anybody recognise this? Am I doing something wrong? Perhaps with LoadFlags or the LoadGroup? I'm not sure when to call AddRequest and RemoveRequest on the LoadGroup, and peeping from nsHttpChannel and nsBaseChannel I'm not sure it's better to call RemoveRequest early or late (before or after OnStartRequest or OnStopRequest)?
Update: Checked on the freshly new Firefox 3.5, still the same
Update: To try to further isolate the issue, I try "file://test/a1.html" with <img src="xxm://test/a.png" /> and still only get above sequence of events happening. If I'm supposed to add this secundary request to a load-group to get AsyncOpen called on it, I have no idea where to get a reference to it.
There's more: I find only one instance of the "Accept" string that get's added to the request headers, it queries for nsIHttpChannelInternal right after creating a new channel, but I don't even get this QueryInterface call through... (I posted it here)
Me again.
I am going to quote the same stuff from nsIChannel::asyncOpen():
If asyncOpen returns successfully, the
channel is responsible for keeping
itself alive until it has called
onStopRequest on aListener or called
onChannelRedirect.
If you go back to nsViewSourceChannel.cpp, there's one place where loadGroup->AddRequest is called and two places where loadGroup->RemoveRequest is being called.
nsViewSourceChannel::AsyncOpen(nsIStreamListener *aListener, nsISupports *ctxt)
{
NS_ENSURE_TRUE(mChannel, NS_ERROR_FAILURE);
mListener = aListener;
/*
* We want to add ourselves to the loadgroup before opening
* mChannel, since we want to make sure we're in the loadgroup
* when mChannel finishes and fires OnStopRequest()
*/
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
loadGroup->AddRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this), nsnull);
nsresult rv = mChannel->AsyncOpen(this, ctxt);
if (NS_FAILED(rv) && loadGroup)
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, rv);
if (NS_SUCCEEDED(rv)) {
mOpened = PR_TRUE;
}
return rv;
}
and
nsViewSourceChannel::OnStopRequest(nsIRequest *aRequest, nsISupports* aContext,
nsresult aStatus)
{
NS_ENSURE_TRUE(mListener, NS_ERROR_FAILURE);
if (mChannel)
{
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
{
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, aStatus);
}
}
return mListener->OnStopRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
aContext, aStatus);
}
Edit:
As I have no clue about how Mozilla works, so I have to guess from reading some code. From the channel's point of view, once the original file is loaded, its job is done. If you want to load the secondary items linked in file like an image, you have to implement that in the listener. See TestPageLoad.cpp. It implements a crude parser and it retrieves child items upon OnDataAvailable:
NS_IMETHODIMP
MyListener::OnDataAvailable(nsIRequest *req, nsISupports *ctxt,
nsIInputStream *stream,
PRUint32 offset, PRUint32 count)
{
//printf(">>> OnDataAvailable [count=%u]\n", count);
nsresult rv = NS_ERROR_FAILURE;
PRUint32 bytesRead=0;
char buf[1024];
if(ctxt == nsnull) {
bytesRead=0;
rv = stream->ReadSegments(streamParse, &offset, count, &bytesRead);
} else {
while (count) {
PRUint32 amount = PR_MIN(count, sizeof(buf));
rv = stream->Read(buf, amount, &bytesRead);
count -= bytesRead;
}
}
if (NS_FAILED(rv)) {
printf(">>> stream->Read failed with rv=%x\n", rv);
return rv;
}
return NS_OK;
}
The important thing is that it calls streamParse(), which looks at src attribute of img and script element, and calls auxLoad(), which creates new channel with new listener and calls AsyncOpen().
uriList->AppendElement(uri);
rv = NS_NewChannel(getter_AddRefs(chan), uri, nsnull, nsnull, callbacks);
RETURN_IF_FAILED(rv, "NS_NewChannel");
gKeepRunning++;
rv = chan->AsyncOpen(listener, myBool);
RETURN_IF_FAILED(rv, "AsyncOpen");
Since it's passing in another instance of MyListener object in there, that can also load more child items ad infinitum like a Russian doll situation.
I think I found it (myself), take a close look at this page. Why it doesn't highlight that the UUID has been changed over versions, isn't clear to me, but it would explain why things fail when (or just prior to) calling QueryInterface on nsIHttpChannelInternal.
With the new(er) UUID, I'm getting better results. As I mentioned in an update to the question, I've posted this on bugzilla.mozilla.org, I'm curious if and which response I will get there.

Resources