I'm fetching a 84mb table from an external API using a Python Client provided by this company (pytd by Treasuredata). I am running this in a 2GB Cloud Functions environment.
All I do is create a connection, an engine and issue a query. This client uses presto under the hood.
Fetching this 84mb table causes a memory overload of the 2GB environment I've been using for this simple task, as described in the Cloud Function logs. The memory usage graph is below.
I can imagine memory leaks happening, but not sure where to look now. I've assessed the code and that the cloud function does have a return value.
I am simply making a call to fetch a reasonably sized table. What could be the issue ?
Related
I am trying to come up with a monitoring solution for MITRE ATT&CK Technique T1115 (Clipboard Data). The data can be retrieved via Powershell (Get-Clipboard) or by using the Windows API (OpenClipboard() or GetClipboardData). Scriptblock logging will allow me to detect the Powershell use, but how can you monitor for those specific API calls?
I have not been able to come up with a solution to track specific API calls. The deepest I can drill is down to the process level, but tracking specific API calls is a mystery to me.
Hi I am new to dask and cannot seem to find relevant examples on the topic of this title. Would appreciate any documentation or help on this.
The example I am working with is pre-processing of an image dataset on the azure environment with the dask_cloudprovider library, I would like to increase the speed of processing by dividing the work on a cluster of machines.
From what I have read and tested, I can
(1) load the data to memory on the client machine, and push it to the workers or
'''psudo code
load data into array and send it to workers through delayed func'''
(2) I can establish a link between every worker node and the data storage (see func below), and access the data on the worker level.
'''def get_remote_image(img_path):
ACCOUNT_NAME = 'xxx'
ACCOUNT_KEY = 'xxx'
CONTAINER = 'xxx'
abfs = adlfs.AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER)
file = abfs.cat(img_path)
image = imageio.core.asarray(imageio.imread(file, "PNG"))
return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)'''
What I would like to know more about is whether there are any best practices on accessing and working on data on a remote cluster using dask?
If you were to try version 1), you would first see warnings saying that sending large delayed objects is a bad pattern in Dask, and makes for large graphs and high memory use on the scheduler. You can send the data directly to workers using client.scatter, but it would still be essentially a serial process, bottlenecking on receiving and sending all of your data through the client process's network connection.
The best practice and canonical way to load data in Dask is for the workers to do it. All the built in loading functions work this way, and is even true when running locally (because any download or open logic should be easily parallelisable).
This is also true for the outputs of your processing. You haven't said what you plan to do next, but to grab all of those images to the client (e.g., .compute()) would be the other side of exactly the same bottleneck. You want to reduce and/or write your images directly on the workers and only handle small transfers from the client.
Note that there are examples out there of image processing with dask (e.g., https://examples.dask.org/applications/image-processing.html ) and of course a lot about arrays. Passing around whole image arrays might be fine for you, but this should be worth a read.
I'm currently working on a new Java application which uses an embedded Neo4j database as its data store. Eventually we'll be deploying to a cloud host which has no persistent data storage available - we're fine while the app is running but as soon as it stops we lose access to anything written to disk.
Therefore I'm trying to come up with a means of persisting data across an application restart. We have the option of capturing any change commands as they come into our application and writing them off somewhere but that means retaining a lifetime of changes and applying them in order as an application node comes back up. Is there any functionality in Neo4j or SDN that we could leverage to capture changes at the Neo4j level and write them off to and AWS S3 store or the like? I have had a look at Neo4j clustering but I don't think that will work either from a technical level (limited protocol support on our cloud platform) or from the cost of an Enterprise licence.
Any assistance would be gratefully accepted...
If you have an embedded Neo4j, you should know where in your code you are performing an update/create/delete query in Neo, no ?
To respond to your question, Neo4j has a TransactionEventHandler (https://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/event/TransactionEventHandler.html) that captures all the transaction and tells you what node/rel has been added, updated, deleted.
In fact it's the way to implement triggers in Neo4j.
But in your case I will consider to :
use another cloud provider that allow you to have a storage
if not possible, to implement a hook on the application shutdown that copy the graph.db folder to a storage (do the opposite for the startup)
use Neo4j as a remote server, and to install it on a cloud provider with a storage.
We are following Embedded Architecture for our S4HANA 1610 system.
Please let me know what will be the impact on Server if we implement 200+ Standard Fiori Apps in our System ?
Regards,
Sayed
When you say “server”, are you referring to the ABAP backend, consisting of one or more SAP application servers and usually one database server?
In this case, you might get a very first impression using transaction ST03.
Here, you get a detailed analysis of resource consumption on the SAP application server.
You also get information about database access times, as seen from the application server.
This can give you a good hint about resource consumption on the database server.
Usually, the ABAP backend is accessed from Fiori via OData calls.
Not every user interaction causes an OData call, some interactions are handled locally at the frontend.
In general, implemented apps only require some space on the hard disk, as long as nobody is using them.
So the important questions for defining the expected workload are:
How many users are working with these apps in which frequency (Avg.
thinktime)?
How many OData calls are sent from these apps to the backend and how
many dialog steps are handled by the frontend itself?
How expensive are these OData calls (see ST03)?
Every app reflects one or more typical business processes, which need to be defined.
Your specific Customizing also plays an important role, because it controls different internal functionality.
It’s also mandatory, to optimize database access, because in productive use, tables might get bigger in size, which might slow down database access over time.
Usually, this kind of sizing is done by SAP Hardware and Technology partners.
I'm a little confused at the memory use of my WCF service. Brief overview, my wcf service is an odata providor that allows my ipad application to talk to our sql server database.
The problem is that when a client (ipad device using objective c odata library) calls for a simple set of data (say get all customers from the database) the memory of the w3wp process goes up by a few mb's, and never really comes back down. Being the fact that all the client wants to do is one off calls (retrieve a data set, update a data set, delete a data set) than once it has finished its call the memory it used to do the action should be relinquished. This is not the case at all? I gather there is some caching happening or maybe the calling instance is not being disposed.
Can anybody steer me in the right direction so the w3wp is lean and blows the memory away after the call has completed.
Thanks in advance
does you database reside on the same machine as your web server? if your indexes are not properly applied you will end up consuming much resources. if you are using MS SQL Server check the minimum memory setting for the server. once reached the minimum memory limit MS SQL Server will probably not free it up until restarted. you should also take a look at you binding configuration. if you use a state full (session) binding and not closing the session the service instance is gonna stay in memory for 10 mins (default) waiting for new client requests from the same proxy object.