Aurora vs MemSQL for low throughput but low latency purpose [closed] - low-latency

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
We are changing our current user side MySQL database. We have a website + mobile app for which users around the US query our database. the relevant data is is contained in three tables, and a join query against the three tables is needed to send the relevant results to the users.
The result sent back to the users are of small size (<6kb). If our objective is to go for low latency and throughput is a low priority, which of the two following databases would perform better:
MemSQL or AWS Aurora?
they both have the same starting cost for hardware (~$0.28/hr). We are only considering those two databases at this stage so that we can continue our "MySQL" in-house knowledge.
I like that i can outsource the DB headache to Aurora. But surely MemSQL's ability to read/write to memory is a lower latency solution?

Nothing beats in-memory for speed and this is what MemSQL is built for. It stores tables (in rowstore mode) in memory and uses a custom query engine to cache queries into an intermediate language so it can execute as fast as possible. Aurora is more like a classic disk-based MySQL instance but with lots of infrastructure changes and optimizations to make the most of Amazon's services.
Before deciding though, you need to figure out what "low-latency" means - is this within seconds or milliseconds?
MemSQL will be faster and most likely in milliseconds depending on your query. Aurora will be slower but can probably deliver sub-second, again depending on your query and the resources allocated and how the data is structured.
Without any more details, the answer is to judge by what your performance tolerance is and then experiment.

Related

Design of HA, consistent and responsive counter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Lets say flipkart launched a exclusive redmi sale by 12PM, stock is 10K but more people will access at same time. There are adv and disadv of keeping the counter in single machine or distributed. If we keep it in some in-memory data store single machine, the machine will become bottleneck as many app machines will retrieve at same time, have to consider memory and cpu for queueing those requests. If its distributed across nodes and machines access different nodes, here we eliminate bottleneck, but a update in node has to be consistent across nodes, this will also affect response time. What can be the design choice for the same?
Yes, a single machine counter will be really a performance bottleneck during intensive load and a single point of failure as well. I would suggest to go for a sharded counter implementation.

Hosting <10gb of read-only data with reasonably fast but cheap access [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
tl;dr
I've currently got a PostGresQL database with about 10gb of data. This data is "archived" data- so it won't ever be changing, but I do need the data to be queryable/searchable/available for reading in the cheapest method possible for my Rails app.
Details:
I'm running a Digital Ocean server, but this is a no-profit project, so keeping costs low is essential. I'm currently using a low-end droplet 4 GB Memory / 40 GB Disk / SFO2 - Ubuntu 16.04.1 x64
Querying this data/loading the pages it's used on can take a significant amount of time occasionally. Some pages timeout because they take over a minute to load. (Given, those are very large pages, but still)
I've been looking at moving the database over to Amazon RedShift, but the base prices seem large- as they're aimed at MUCH larger projects than mine.
Is my best bet to try to put more and more into making the queries small and only rendering small bits at a time? Even basic pages have a long query time because the server is slowed down so much. Or is there a method similar to RedShift that will allow me to quickly query the data while also storing it externally for a reasonable price?
You can try Amazon S3 and Amazon Athena. S3 is a super simple storage where you can dump your data in text files and Athena is a service that provides SQL-like interface to data stored on S3. S3 is super cheap and Athena has per runtime cost. Since you said your data isn't going to change and is going to be queried rarely it's a good solution. Check this out: 9 Things to Consider When Choosing Amazon Athena

Should I store emails in my database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
In theory, I'm looking to integrate email into a CRM I'm building. I'd like for each contact with an email address to have those sent and received emails fetched when viewing that contact with a connected IMAP account.
Would one store these emails in the database or would it be faster/more efficient to fetch these emails on the fly? (when the contact page is accessed via a GET request).
Would one store these emails in the database or would it be faster/more efficient to fetch these emails on the fly?
Have you measured the performance to find out? Don't prematurely optimize, actually identify bottlenecks. Set up some tests (large-scale, repeatable... don't just test one email one time) to retrieve emails from the CRM system vs. retrieving them from the database. See if there's a significant difference. Include that information in your decision-making process.
Additionally, there are other things to consider when making this decision. Namely:
Will these emails be modified in the scope of a transaction? Databases are good at participating in a transaction scope for a unit of work in your code, third-party services and APIs not so much. You might want to put them in the database if they're needed as part of such a scope. (Though given the description, that's unlikely.)
Duplicating data between multiple systems (multiple "sources of truth") and keeping it synchronized is hard. It introduces a lot of unexpected complexity into a system. You may see a performance gain, but is it worth it? Maybe some application-level caching will yield just as much of a gain without duplicating the data and introducing another dependency into the mix?

Which graph database [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Which graph database should I use when dealing with a couple of thousand nodes and a couple of thousands relationships? Are these big numbers for any database or not? Which graph database is the fastest at read operations (assuming all data is loaded once at the beggining).
I had a look at neo4j and its visualization tool. Will I be able to have such a visualization tool in my application?
The questions you'll need to ask and answer for a graph database are similar to any other database. How much data? In memory or persistent? How will you interface with it? Embedded or a server process? Distributed or localized? Licensing?
A couple of thousand nodes and relationships is small for a graph database and most any graph database solution will work. For most people Neo4j is a fine choice, but there are some caveats. First, the licensing of Neo4j can be problematic in many situations. Secondly, the visualizer is part of the Neo4j server process - which means you're going to have another server process running. If you're concerned about the licensing you may want to check out OrientDB, which is under the Apache license, and thus very flexible.
From the sounds of it, you have a fairly small system and may be able to get by with using TinkerGraph, an in-memory graph database from Marko Rodriguez and the Tinkerpop hackers. It has the option to persist your data to a file if needed, is amazingly lightweight, and, like Neo4j and OrientDB, supports all the graph tools from the Tinkerpop stack, including the Jung Ouplemntation, which can give you the visualizations you desire.

Is there a way that I can use the 100% of my network bandwidth with only one connection? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I have a program that reads about a million of rows and group the rows; the client computer is not stressed at all, no more than 5% cpu usage and the network card is used at about 10% or less.
If in the same client machine I run four copies of the program the use grow at the same rate, with the four programs running, I get about 20% cpu usage and about 40% network usage. That makes me think that I can improve the performance using threads to read the information from the database. But I don't want to introduce this complexity if a configuration change could do the same.
Client: Windows 7, CSDK 3.50.TC7
Server: AIX 5.3, IBM Informix Dynamic Server Version 11.50.FC3
There are a few tweaks you can try, most notably setting the fetch buffer size. The environment variable FET_BUF_SIZE can be set to a value such as 32767. This may help you get closer to saturating the client and the network.
Multiple threads sharing a single connection will not help. Multiple threads using multiple connections might help - they'd each be running a separate query, of course.
If the client program is grouping the rows, we have to ask "why?". It is generally best to leave the server (DBMS) to do that. That said, if the server is compute bound and the client PC is wallowing in idle cycles, it may make sense to do the grunt work on the client instead of the server. Just make sure you minimize the data to be relayed over the network.

Resources