Where can I find my SipmleDB domain sizes - amazon-simpledb

The documentation says that simpledb domain size max is 10GB. However, I can't find anywhere that I can determine what my current domain sizes are.

You can find that info by making a SimpleDB API call to DomainMetadata passing the name of the domain as a parameter.
To determine how close you are to the 10GB size limit, take the sum of the response values:
ItemNamesSizeBytes + AttributeNamesSizeBytes + AttributeValuesSizeBytes
For reference, here is the DomainMetadata documentation.

Related

Distributed Hash Tables. Do I need shortcuts (even in a large system)?

I am learning about DHT and I came upon shortcuts. It seems they are used to make the routing faster and skip going directly back-the-chain for better performance. What I don't understand is: Suppose we have a circular DHT made out of 100 servers/nodes/HT. You get some key data to server/node/HT 10 and it must be sent to server/node/HT 76. When the destination is reached, and the value is taken couldn't I just provide the IP of the requester (server 10) and then it will directly send the value to 10, which seems to make shortcuts useless?
Thank you in advance.
Edit: Useless for returning the value. Not getting to it.
You're assuming a circular network layout and forwading-based routing. Both of which only apply to a subset of DHTs.
Anyway, the forward path would still go through all the nodes, any of which might be down or have transient network problems. As the number of hops goes up so does the cumulative error probability. Additionally it increases latency, which matters on a global scale because at least simple DHT routing algorithms don't account for physical proximity.
For the return it can also matter if reachability is asymmetrical, e.g. due to firewalls.

MS Graph API: What is subscription ID max length?

What is MS Graph "subscription id" property max length?
In examples length of id is 36 characters (e.g. "7f105c7d-2dc5-4530-97cd-4e7ae6534c07").
It will be always like this? I can't find info in documentation.
The documentation doesn't explicitly states it is an UUID... though it certainly looks line one, probably will be one, and will most likely always be one. However, imho, unless you really have problems in terms of storage, it is best to reserve a reasonable size and assume this ID is a "opaque string" that you just store, and assume is unique (so you can make some key of it, or build an index on it, if you would be referring to a database as the storage). If there are other reasons why you need to know the side, please clarify...

MS Graph API: Change page size after expanding children using query params

I'm using the MS Graph API to expand children for their name and downloadURL. This is working very well:
/path/?$expand=children($select=name,content.downloadUrl)
I want to increase the page size from the default 200 to 999 (or whatever max size it will allow). Reading the MS Graph docs, I learned that I can use $top=(int) to change the max page size.
I've tried this:
/path/?$expand=children($top=999&$select=name,content.downloadUrl)
And this:
/path/?$expand=children($select=name,content.downloadUrl;top=999)
But neither of these solutions work. I also tried replacing top=999 with something smaller like top=3, but that doesn't work either and always returns 200 children. It's as if the "top" isn't even applied.
Any help for this? Thanks!
You cannot control the page size in $expand. Expand should be used for situations where you want a sample set of the underlying data rather than the complete data set. It's generally best to think of it as a quick way to get the first page of data.
More importantly, you really don't want a REST API to give you "whatever the max size". HTTP may be super flexible but it is not optimal for moving large payloads and, as a result, performance will be horrible.
For optimal performance, you should try to keep your page sizes around 100 records (smaller is better) and processing each page of data as you receive it.

Redis optimal hash set entry size

I have some questions regarding the optimal entry size setting for Redis hash sets.
In this example memory-optimization they use 100 hash entries
per key but use hash-max-zipmap-entries 256 ? Why not
hash-max-zipmap-entries 100 or 128?
On the redis website (above link) they used max hash entry size of
100, but in this post instagram, they mention 1000 entries. So
does this mean the optimal setting is a function of the product of
hash-max-zipmap-entries & hash-max-zipmap-value ?(ie in this case
Instagram has smaller hash-values than memory optimization example?)
Your comments/clarifications are much appreciated.
The key is, from here:
manipulating the compact versions of these [ziplist] structures can become slow as they grow longer
and
[as ziplists grow longer] fetching/updating individual fields of a HASH, Redis will have to decode many individual entries, and CPU caches won’t be as effective
So to your questions
This page just shows an example and I doubt the author gave much thought to the exact values. In real life, IF you wanted to take advantage of ziplists, and you knew your number of entries per hash was <100, then setting it at 100, 128 or 256 would make no difference. hash-max-zipmap-entries is only the LIMIT over which you're telling Redis to change the encoding from ziplist to hash.
There may be some truth in your "product of hash-max-zipmap-entries & hash-max-zipmap-value" idea, but I'm speculating. More importantly, first you have to define "optimal" based on what you want to do. If you want to do lots of HSET/HGETs in a large ziplist, it will be slower than if you used a hash. But if you never get/update single fields only ever do HMSET/HGETALL on a key, large ziplists wouldn't slow you down. The Instagram 1000 was THEIR optimal number based on THEIR specific data, use cases, and Redis function call frequencies.
You encouraged me to read both links and it seems that you are asking for "default value for hash table size".
I don't think that it's possible to say that one number is universal for all possibilities. The described mechanism is similar to standard hash mapping. Look at http://en.wikipedia.org/wiki/Hash_table
If you have small size of hash-table, it means that many various hash values point into the same array, where the equals method is used to find out the item.
On the other hand, large hash table means that it allocates large memory along with many empty fields. But this scales well as the algorithm uses O(1) big O notation and there is no equals searching for the item.
In general the size of the table IMHO depends on the overall count of all elements you expect to put into the table and it also depends on the diversity of the key. I mean if every hash start with "0001" not even size=100000 would help you.

epoll_create and epoll_wait

I was wondering about the parameters of two APIs of epoll.
epoll_create (int size) - in this API, size is defined as the size of event pool. But, it seems that having more events than the size still works. (I've put the size as 2 and forced event pool to have 3 events... but it still works !?) Thus I was wondering what this parameter actually means and curious about the maximum value of this parameter.
epoll_wait (int maxevents) - for this API, the maxevents definition is straight-forward. However, I can see the lackness of information or advices on how to determin this parameter. I expect this parameter to be changed depending on the size of epoll event pool size. Any suggestions or advices will be great. Thank you!
1.
"man epoll_create"
DESCRIPTION
...
The size is not the maximum size of the backing store but just a hint
to the kernel about how to dimension internal structures. (Nowadays,
size is unused; see NOTES below.)
NOTES
Since Linux 2.6.8, the size argument is unused, but must be greater
than zero. (The kernel dynamically sizes the required data struc‐
tures without needing this initial hint.)
2.
Just determine an accurate number by yourself, but be aware that
giving it a small number may drop out the efficiency a little bit.
Because the smaller number assigned to "maxevent" , the more often you may have to call epoll_wait() to consume all the events, queued already on the epoll.

Resources