Would it be possible to use a PersistentEntityStore and one or more plain Store instances in the same Environment instance? I was hoping to use transactions that cover changes on such a combination.
I see potential conflicts with store names that I would have to avoid. Anything else?
It's possible to mix code using different API layers inside a single transaction. The only requirement is the data touched by different API should be isolated, disjoint sets of names of Stores should be used.
What are the names of Stores used by PersistentEntityStore? Any PersistentEntityStore has its own unique name, and names of all Stores, that represent mapping of the entity store to key/value layer, start with "${PersistentEntityStore name}.", as it's specified in the source code.
Another issue is that API is not complete for such approach. After a StoreTransaction is created against the PersistentEntityStore, it should be be cast to PersistentStoreTransaction in order to call PersistentStoreTransaction#getEnvironmentTransaction() for getting underlying transaction:
final StoreTransaction txn = entityStore.beginTransaction();
// here is underlying Transaction instance:
final Transaction envTxn = ((PersistentStoreTransaction) txn).getEnvironmentTransaction();
Related
I have to implement a system where a tenant can store multiple key-value stores. one key-value store can have a million records, and there will be multiple columns in one store
[Edited] I have to store tabular data (list with multiple columns) like Excel where column headers will be unique and have no defined schema.
This will be a kind of static data (eventually updated).
We will provide a UI to handle those updates.
Every tenant would like to store multiple table structured data which they have to refer it in different applications and the contract will be JSON only.
For Example, an Organization/Tenant wants to store their Employees List/ Country-State List, and there are some custom lists that are customized for the product and this data is in millions.
A simple solution is to use SQL but here schema is not defined, this is a user-defined schema, and though I have handled this in SQL, there are some performance issues, so I want to choose a NoSQL DB that suits better for this requirement.
Design Constraints:
Get API latency should be minimum.
We can simply assume the Pareto rule, 80:20 80% read calls and 20% write so it is a read-heavy application
Users can update one of the records/one columns
Users can do queries based on some column value, we need to implement indexes on multiple columns.
It's schema-less so we can simply assume it is NoSql, SQL also supports JSON but it is very hard to update a single row, and we can not define indexes on dynamic columns.
I want to segregate key-values stores per tenant, no list will be shared between tenants.
One Key Value Store :
Another key value store example: https://datahub.io/core/country-list
I am thinking of Cassandra or any wide-column database, we can also think of a document database (Mongo DB), every collection can be a key-value store or Amazon Dynamo database
Cassandra: allows you to partition data by partition key and in my use case I may want to get data by different columns in Cassandra we have to query all partitions which will be expensive.
Your example data shows duplicate items, which is not something NoSQL datbases can store.
DynamoDB can handle this scenario quite efficiently, its well suited for high read activity and delivers consistent single digit ms low latency at any scale. One caveat of DynamoDB compared to the others you mention is the 400KB item size limit.
In order to get top performance from DynamoDB, you have to utilize the Partition key as much as possible, because it provides you with hash-based access (super fast).
Its obvious that unique identifier for the user should be present (username?) in the PK, but if there is another field that you always have during request time, like the country for example, you should include it in the PK.
Like so
PK SK
Username#S2#Country#US#State#Georgia Address#A1
It might be worth storing a mapping for the countries alone so you can retrieve them before executing the heavy query. Global Indexes can't be more than 20, keep that in mind and reuse/overload indexes and keys as much as possible.
Stick to single table design to utilize this better.
As mentioned by Lee Hannigan, duplicated elements are not supported, all keys (including those of the indexes) must be unique pairs
In my code I want to take advantage of ETS's bag type that can store multiple values for single key. However, it would be very useful to know if insertion actually inserts a new value or not (i.e. if the inserted key with value was or was not present in the bag).
With type set of ETS I could use ets:insert_new, but semantics is different for bag (emphasis mine):
This function works exactly like insert/2, with the exception that instead of overwriting objects with the same key (in the case of set or ordered_set) or adding more objects with keys already existing in the table (in the case of bag and duplicate_bag), it simply returns false.
Is there a way to achieve such functionality with one call? I understand it can be achieved by a lookup followed by an optional insert, but I am afraid it might hurt performance of concurrent access.
I am building a large graph database that has a significant set of meta data about each node (thousands of properties per node). I am currently going through the process of determining which meta data should be a node within Neo4j, which should become a property of the Node and which should be housed in a separate database.
My thinking is to use the meta data in 3 ways:
1 - If the property is shared between many nodes, to make that property it's own node and create an edge to that property.
2 - If the property is important to traversing the graph, but not "highly" shared, to add that as a node property. (Which could also be indexed within Neo4j if needed)
3 = If the meta data is strictly describing that node, to have that stored in a separate NoSQL database, with the Neo4J Node ID becoming the foreign key to the other database.
While it seems like the most efficient use of using the graph database, it seems like a pain to have the different property types and having to determine which type of property it is before using it. (Likely a property lookup key-value store) It would also likely mean that I would need an easy way to promote a property from 3 to 2 to 1 for instances when a property becomes highly shared, or needed for efficient traversal.
Has anyone taken this approach? Any thoughts to share, or things to avoid?
Do never ever store a Neo4j node id in an external system. The node id is basically a offset in the respective store file. If you delete a node its id might be reused when new nodes are created.
The right approach is to have a "good" identifier (e.g. uuid) as a node property and put that into Neo4j's index. That uuid is then save to be stored in third party systems.
Some time ago I've created a unmanaged extension that adds a uuid to each new node and prevent manual changes to these uuids: https://github.com/sarmbruster/neo4j-uuid.
Update (2013-08-21)
I've blogged about UUIDs with Neo4j.
I am doing some maintenance on a database for an application that uses the Bold for Delphi object persistence framework. This database has been been in production for several years and several of the tables have grown quite large. One of them is the BOLD_CLOCKLOG which has something to do with Bold's transaction management.
I want to trim this table (it is up to 1.2GB, with entries from Jan 2006).
Can anyone confirm the system does not need this old information?
From the bolds documentation:
BOLD_CLOCKLOG
To be able to map the transaction numbers used in the TimeStamp columns to the corresponding physical time (such as 2001-01-01 12:34) the persistence mapper will store a log with timestamps and times. Normally, this log is written for each database operation, but if the traffic to the database is very intensive, it is possible to restrict how often this log is written by setting the property ClockLogGranularity. The event OnGetCurrentTime should also be implemented to ensure that all clients have the same time.The usage of this table can be controlled with the tagged value: Model.UseClockLog
So I believe this is used for versioning Boldobjects, see Object Versioning Extension in bolds documentation. If your application don't need this you can drop this in the database.
In our Bold application we don't use that feature. Why don't simply test to turn off Bold_ClockLog in the model, drop that big table and try to use your application. I'm pretty sure if something is wrong then it say so at once.
I can also mention that we have an own custom objecthistoy. It is simply big string (as TStringList.DelimetedText) in a ObjectHistory class that have Time, user and a note about action. This suits our need better that Bolds builtin objecthistory. The disadvantage is of course that we need to add calls in the code when logging to history is done.
Bold_ClockLog is an optional table, it's purpose is to store mapping between integer timestamps and corresponding DateTime values.
This allows you to find out datetime of the last modification to any object.
If you don't need this feature feel free to empty the table, it won't cause any problems.
In addition to Bold_ClockLog, the Bold_XFiles is another optional table that tends to grow large. But unlike the Bold_ClockLog the Bold_XFiles can not be emptied.
Both of these tables can be turned on/off in the model tag values.
Is there an equivalent to an identity Seed in SimpleDB?
If the answer is no, how do you handle creating something like a customer number or order number that will prevent the creation duplicate numbers?
My experience is mainly from SQL Server in which I would either create a primary key with an identity seed or use transactions in a stored procedure to increment the number.
Thanks for your help!
You can create unique keys using conditional writes. Just do a PutAttributes with the next customer number you want to use and the data you want to store. You can't add a condition for the actual item name, but you can use an attribute that always exists, (like creation date or user group).
Set the conditions:
Expected.1.Name=creation_date
Expected.1.Exists=false
The call will succeed only if there is no creation_date in an item with that item name. If you always write the creation_date, then you get the effect of optimistic locking on the new item name. Of course you can use any attribute you want, so long you always include it in that first conditional put.
The performance of the conditional write is the same as a normal write in most situations but when SimpleDB is under heavy load or high internal network latencies, these calls will take longer, compared to normal writes. During rare failure scenarios inside SimpleDB, the conditional writes will fail completely for a period of time.
If you can't tolerate this, you will have to code some sort of alternate way to get your unique keys during outages. A different SimpleDB region could be used for key generation only, since SimpleDB will still accept the normal writes (non-conditional PutAttributes) during outages.
If you don't already have something unique that will work, using a GUID for the Item is probably the typical solution.