Do Idle Snowflake Connections Use Cloud Services Credits? - connection

Motivation | Suppose one wanted to execute two SQL queries against a Snowflake DB, ~20 minutes apart.
Optimization Problem | Which would cost fewer cloud services credits:
Re-using one connection, and allowing that connection to idle in the interim.
Connecting once per query.
The documentation indicates that authentication incurs cloud services credit usage, but does not indicate whether idle connections incur credit usage.
Question | Does anyone know whether idle connections incur cloud services credit usage?

Snowflake connections are stateless. They do not occupy a resource, and they do not need to keep the TCP/IP connection alive like other database connections.
Therefore idle connections do not consume any the Cloud Services Layer credits unless you enable "CLIENT_SESSION_KEEP_ALIVE".
https://docs.snowflake.com/en/sql-reference/parameters.html#client-session-keep-alive
When you set CLIENT_SESSION_KEEP_ALIVE, the client will update the token for the session (default value is 1 hour).
https://docs.snowflake.com/en/sql-reference/parameters.html#client-session-keep-alive-heartbeat-frequency
As Peter mentioned, the CSL usage up to 10% of daily warehouse usage is free, so refreshing the tokens will not cost you anything in practice.
About your approaches: I do not know how many queries you are planning to run daily, but creating a new connection for each query can be a performance killer. For costs perspective, idle connection will do max 24 authorization requests on a day, so if you are planning to run more than 24 queries on a day, I suggest you to pick the first approach.

Even if idle connections do not cost anything in the Cloud Services respect, is your warehouse running with idle connections hence giving you other costs to consider? I am guessing there's more factors to consider overall which you can speak to your Snowflake Account Team to discuss. Not trying to dodge your question, but trying to give a more wholesome answer!
In general, the Cloud Services costs are typically on the lower side compared to your other costs. Here are the main drivers for cloud service costs and how to minimalize them: https://community.snowflake.com/s/article/Cloud-Services-Billing-Update-Understanding-and-Adjusting-Usage
The best advice you may get is to test your connections/workflows and compare the costs over time. The overall costs are going to depend on several factors. Even if there's a difference in costs between two workflows, you may still have to analyze the cost/output ratio and your business needs to determine if it's worth the savings.

Approach 1 will incur less cloud services usage, but more data transfer charges (to keep the connection alive). Only the Auth event incurs cloud services usage.
Approach 2 will incur more cloud services usage, but less data transfer charges.
However, the amount of cloud services usage or data transfer charges are extremely small in either case.
Note - any cloud services used (up to 10% of daily warehouse usage) are free, whereas there is no free bandwidth allocation, so using #2 may save you a few pennies.

Related

Will Google Cloud Run support GPU/TPU some day?

So far Google Cloud Run support CPU. Is there any plan to support GPU? It would be super cool if GPU available, then I can demo the DL project without really running a super expensive GPU instance.
So far Google Cloud Run support CPU. Is there any plan to support GPU?
It would be super cool if GPU available, then I can demo the DL
project without really running a super expensive GPU instance.
I seriously doubt it. GPU/TPUs are specialized hardware. Cloud Run is a managed container service that:
Enables you to run stateless containers that are invokable via HTTP requests. This means that CPU intensive applications are not supported. Inbetween HTTP request/response the CPU is idled to near zero. Your expensive GPU/TPUs would sit idle.
Autoscales based upon the number of requests per second. Launching 10,000 instances in seconds is easy to achieve. Imagine the billing support nightmare for Google if customers could launch that many GPU/TPUs and the size of the bills.
Is billed in 100 ms time intervals. Most requests fit into a few hundred milliseconds of execution. This is not a good execution or business model for CPU/GPU/TPU integration.
Provides a billing model which significantly reduces the cost of web services to near zero when not in use. You just pay for the costs to store your container images. When an HTTP request is received at the service URL, the container image is loaded into an execution unit and processing requests resume. Once requests stop, billing and resource usage also stop.
GPU/TPU types of data processing are best delivered by backend instances that protect and manage the processing power and costs that these processor devices provide.
You can use GPU with Cloud Run for Anthos
https://cloud.google.com/anthos/run/docs/configuring/compute-power-gpu

Is DataSnap Optimized for responding to more than 1k users at the same time?

We want to start a big multi-tier application. The server side application must respond to more than 1000 users at the same time. We want to create server application by 64 bit compiler and client side with 32 bit. In this case we don't know DataSnap can respond to all client without any problem or not?
In this case The Server computer is very powerful (multi-processor and more than 16GB of RAM) and DataBase Management system is FireBird 2.5.
You need a way to perform realistic load tests.
For the Firebird database, you can simulate concurrent users with the free Apache JMeter tool. It can run SQL statements and record their execution time statistics (average, min/max etc.). So you could for example create a thread group with twenty different SQL queries, and then run twenty threads which each will perform these queries sequentially.
JMeter allows to define time limits on the SQL query, and JMeter treats it as an error if the query exceeds this limit. Then you can try to find the maximum client count where the overall error rate is still less than (for example) five percent.
But you also need to know how high the expected database load will be, and you will also need to have a test database with a realistic size, not only a couple of records. Also, some database queries like reports might cause higher load - these should be included in the simulation too, as they can affect overall performance. In JMeter, you can create a second thread group, running in parallel with the first one, for these long-running statements with different settings (less simulated clients).
Testing the database will show if there is a bottleneck already in this area. For example, the test result could be that the database can serve twenty clients with a total average transaction rate of 20 TPS (transactions per second), which means one client executes one transaction per second. But this TPS value will decrease with higher user count.
Related question: Firebird usage in big projects which also has a link to http://www.firebirdsql.org/en/case-studies-catalog/
Regarding DataSnap client load simulation: this can be done with a scripted client, which runs a predefined set of statements / commands over the connection.
To run a high number of load test clients simultaneaously you could use a service like Amazon Elastic Compute Cloud (EC2), to launch clones of your test machine image, saving you hardware costs. But of course I would start with a small client machine which simply runs ten or twenty scripted clients.
As far as I know DataSnap is based on Indy. And Indy's connection handling model is not very scaleable - one thread per connection, which is very resource consuming. Even using Indy's thread pools is not an option I think... Also in Windows (32 bit) for example there is a limit of the maximum threads you can create (2000 IIRC). Anyway - using many threads is not good and hits performance of the server (for reference - Windows Internals book, Windows Performance Team blog etc.)
A scalable, robust and professional application server would use IO Completion ports (IOCP) for data processing. But I don't know if DataSnap can take advantage of it.
UPDATE:
On the CodeRage7 I asked similar scaleability questions. Here are the answers:
Q: Recently there was a question on StackOverflow about DataSnap's scaleability/performance. So can DS handle, for example, 2000 or more concurent user request at the network and application level?
A: the scalability is based on scalability of TCP/HTTP/HTTPs and # of connections allowed in your server operating system. Also based on memory and hardware you employ. There is no specific limit in DataSnap.
My comment: While this is true, Indy's connection handling model, i.e. one thread per connection, introduces bottleneck especially in 32 bit Windows (2000 threads max). In Win64 it should not be so much problem, but again - this kind of handling data flow leads to performance degradation.
Q: Does DataSnap support some kind of load balancing?
A: Not directly. You can do this in code in your DataSnap server(s).
My comment: I've found very good paper on implementing Failover/Load Balancing in DataSnap in Andreano Lanusse's blog
Q: Does DataSnap support IO Completion ports for better scaleability?
This my question was left unanswered.
Hope this helps!
UPDATE2:
I found very interesting post on DS Performance: DataSnap analysis based on Speed & Stability tests
UPDATE3:
DataSnap, Deployment, Performance, and More (Marco Cantu)
Monitoring and control of connections in DataSnap XE2 - translated in English
Monitoring and control of connections in DataSnap XE2 - original
When the specifications for a system are made, you need to be very precise when it's about multiple users.
For example: you create a website, and the client expects 15.000 unique users.
Then the client usually comes up with a requirement that the system needs to support 15.000 simultaneous users, which is very naive.
You'll need a more detailed specification than that.
Usually it's more sensible to say something like: in 99% of the requests, 99% of the users can get a response to their request within 5 seconds average.
In normal usage, you'll never see all users send a request within the same second. If at some point they all arrive within the same minute (also very unlikely), you'll have a lot fewer concurrent users.
Even for websites with tens of thousands of users, where most of them connect on a daily basis, the webserver is idle most the time, and once and a while it jumps to 5% or in extreme cases to 20%. If we really have to serve all of these users at once we'd be screwed, but that never happens, and it's not realistic to prepare a server for such loads.

Defining Scaling Threshold for Azure Web Roles

Azure embraces the notion of elastic scaling and I've been able to acheive this with my Worker Roles. However, when it comes to my Web Roles (e.g. MVC Apps) I am not sure what to monitor (or how) to determine when its a good time to increase (or descrease) the number of running instances. I'm assuming I need to monitor one or many Performance Counters but not sure where to start.
Can anyone recommend a best practice for assessing an MVC Web Role instances load relative to scaling decisions?
This question is a bit open-ended, as monitoring is typically app-specific. Having said that:
Start with simple measurements that you'd look at on a local server, representing KPIs for your app. For instance: Maybe look at network utilization. This TechNet article describes performance counters collected by System Center for Windows Azure. For instance:
ASP.NET Applications Requests/sec
Network Interface Bytes
Received/sec
Network Interface Bytes Sent/sec
Processor % Processor Time Total
LogicalDisk Free Megabytes
LogicalDisk % Free Space
Memory Available Megabytes
You may also want to watch # of requests queued and request wait time.
Network utilization is interesting, since your NIC provides approx. 100Mbps per core and could end up being a bottleneck even when CPU and other resources are underutilized. You may need to scale out to more instances to handle high-bandwidth scenarios.
Also: I tend to give less importance to CPU utilization, even though it's so easy to measure (and shows up so frequently in examples). Running a CPU at near capacity is a good thing usually, since you're paying for it and might as well use as much as possible.
As far as decreasing: This needs to be handled a bit more carefully. Windows Azure compute is billed by the hour. If, say, you scale out to an extra instance at 11:50 and scale in again at 12:10, you've just incurred two cpu-hours. Also: You don't want to scale out, then take new measurements and deciding you can now scale back again (effectively creating a constant pulse of adding and decreasing instances). To make things easier, consider the Autoscaling Application Block (WASABi), found in the Enterprise Library. This has all the scale rules baked in (such as the ones I just mentioned) and is very straightforward to use.

Erlang fault-tolerant application: PA or CA of CAP?

I have already asked a question regarding a simple fault-tolerant soft real-time web application for a pizza delivery shop.
I have gotten really nice comments and answers there, but I disagree in that it is a true web service. Rather than a web service, it is more of a real-time system to accept orders from customers, control the dispatching of these orders and control the vehicles that deliver those orders in real time.
Moreover, unlike a 'true' web service this system is not intended to have many users - it is just a few dispatchers (telephone operators) and a few delivery drivers that will use it (as for now I have no requirement to provide direct access to the service to the actual customers; only the dispatchers and delivery drivers will have the direct access).
Hence this question is a bit more general.
I have found that in order to make a right choice for a NoSQL data storage option for this application first thing that I have to do is to make a choice between CA, PA and CP according to the CAP theorem.
Now, the Building Web Applications with Erlang book says that "while it [Mnesia] is not a SQL database, it is a CA database like a SQL database. It will not handle network partition". The same book says that the CouchDB database is a PA database.
Having that in mind, I think that the very first thing that I need to do with my application is to decide what the 'fault-tolerance' term means regarding to CAP.
The simple requirement that I have is to have the application available 24/7(R1). The other one is that there is no need to scale, the application will have a very modest amount of users (it is probably not possible to have thousands of dispatchers) (R2).
Now, does R1 require the application to provide Consistency, Availability and Partition Tolerance and with what priorities?
What type of data storage option will better handle the following issues:
Providing 24/7 availability for a dispatcher (a person who accepts phone calls from customers and who uses a CRM) to look up customer records and put orders into the system;
Looking up current ongoing served orders and their status (placed, baking, dispatched, delivering, delivered) in real time;
Keep track of all working vehicles' locations and their payloads in real time;
Recover any part of the system after system crash or network crash to continue providing 1,2 and 3;
To sum it up: What kind of Data Storage (CA, PA or CP) will suite the system described above better? What kind of Data Storage will better satisfy the R1 requirement?
For your 24/ requirement you are searching a database with (High) Availability because you want your requests to succeed everytime (even if they are only error results).
A netsplit would bringt your whole system down, when you have no partition tolerance
Consistency is nice to have, but you can only have 2 of 3.
Your best bet will be a PA solution. I highly recomment a solution which has been inspired by Amazon Dynamo. The best known dynamo implementations are riak and couchdb. Riak even allows you to change PA to some other form by tuning the read and write replicas.
First, don't confuse CAP "Availability" with "High Availability". They have nothing to do with each other. The A in CAP simply means "All DB nodes can answer queries". To get High Availability, you must be in multiple data centers, you must have robust documented procedures for maintenance, expansion, etc. None of that depends on your CAP choice.
Second, be realistic about your requirements. A stock-trading application might have a requirement for 100% uptime, because every second of downtime could loose millions of dollars. On the other hand, I'm guessing your pizza joint might loose tens of dollars for every minute it's down. So it doesn't make sense to spend millions trying to keep it up. Try to compute your actual costs.
Third, always evaluate your choice vs mainstream. You could just go CA (MySQL) and quickly fail-over to the slaves when problems happen. Be realistic about the costs (and risks) of building on new technology. If you really expect your system to run for 5 years without downtime, ask for proof that someone else has run that database for 5 years without downtime.
If you go "AP" and have remote people (drivers, etc.) then you'll need to write an app that stores their data on their phone and sends it in the background (with retries). Of course, you could do this regardless of weather your database was CA or AP.
If you want high uptimes, you can either:
Increase MTBF (Mean Time Between Failures) - Buy redundant power supplies, buy dual ethernet cards, etc..
Decrease MTTR (Mean Time To Recovery) - Just make sure when failure happens you can recover quickly. (Fail over to slave)
I've seen people spend tens of thousands of dollars on MTBF, only to be down for 8 hours while they restore their backup. It makes more sense to ensure MTTR is low before attacking MTBF.

Berkeley DB replication: upper limit on number of replicants?

I'm considering using Berkeley DB to cache some data on an application cluster. What's a reasonable upper limit on the number of nodes I can plan on Berkeley DB handling? Writing to the database will be from a single node.
Mark,
Most of our customers are using replication groups of 5-20 nodes, although we have some large customers running with much larger replication groups. There is no inherent limit built into Berkeley DB.
The real world limit will depend on your read/write workload mix, how you configure your replication system and the amount of CPU cycles available on the master system. Basically, the master needs to communicate with each replica (send log records, process acknowledgements, respond to requests, etc.). Each replica that communicates with the master adds a small amount of overhead. For a mostly read/occasionally write workload, the master doesn't have to communicate that often and communicating with the replicas requires minimal processing. On a predominantly write workload, the master is actively communicating with the replicas and incurs a more significant workload per replica. You can reduce the workload on the master by funneling read operations to the replicas and by utilizing the Berkeley DB HA client-to-client synchronization feature.
Your mileage will vary, so the best approach is to test a prototype of your application and evaluate the balance of throughput, application requirements and available CPU cycles. Do you have a sense of how many nodes you expect to need in your replication groups?
Regards,
Dave
PS: The Getting Started with Replication Guide is a good place to start.

Resources