I have a question related to Hyperledger fabric blockchain data model. I need to handle two types of objects, and for each one of them I'll probably need a separate smart contract to load the data related to each type of object, and handle that data.
For instance, lets say I need a smart contract to handle tokens that represent pencils, so that they can be transferred from one account to another. And lets say that I also need a separate smart contract to handle the trees that were involved in the creation of the pencils, because first I need to process the data about the trees, to later issue the corresponding UTXO-tokens in the channel.
Lets suppose that, for every tree that is added into the blockchain, 100 pencil tokens should be issued.
I'm going to be using the UTXO token provided by Hyperledger fabric in https://github.com/hyperledger/fabric-samples/tree/main/token-utxo to handle the pencil tokens, which already provides functions to "mint" and "transfer" tokens, but I need some insight in how to relate the logic between this smart contract and the contract to process the trees in the ledger. It would be ideal to stablish a way of tracking the pencil tokens, providing a functionality to proof that they were created from a particular tree.
Any information or tips on how to do this will be highly appreciated.
The UTXO model is used for tracking fungible tokens, that is, tokens that are identical and interchangeable. If you want to track the source of your pencils, then they are not identical and interchangeable, so you should instead consider a non-fungible token model where each asset can have unique properties. For example each pencil could have a unique identifier (key) with a set of JSON properties (value), where the source tree is one of the JSON properties. See the asset transfer samples for ideas around modeling and tracking non-fungible assets with properties.
Related
We currently have a home grown authz system in production that uses opa/rego policy engine as core for decision making(close to what netflix done). We been looking at Zanzibar rebac model to replace our opa/policy based decision engine, and AuthZed got our attention. Further looking at AuthZed, we like the idea of defining a schema of "resource + subject" types and their relations (like OOP model). We like the simplicity of using a social-graph between resource & subject to answer questions. But the more we dig-in and think about real usage patterns, we get more questions and missing clarity in some aspects. I put down those thoughts below, hope it's not confusing...
[Doubts/Questions]
[tuple-data] resource data/metadata must be continuously added into the authz-system in the form of tuple data.
e.g. doc{org,owner} must be added as tuple to populate the relation in the decision-graph. assume, i'm a CMS system, am i expected to insert(or update) the authz-engine(tuple) for for every single doc created in my cms system for lifetime?.
resource-owning applications are kept in hook(responsible) for continuous keep-it-current updates.
how about old/stale relation-data(tuples) - authz-engine don't know they are stale or not...app's burnded to tidy it?.
[check-api] - autzh check is answered by graph walking mechanism - [resource--to-->subject] traverse path.
these is no dynamic mixture/nature in decision making - like rego-rule-script to decide based on json payload.
how to do dynamic decision based on json payload?
You're correct about the application being responsible for the authorization data it "owns". If you intend to have a unique role/relationship for each document in your system, then you do need to write/delete those relationships as the referenced resources (or the roles on them, more likely) change, but if you are using an RBAC-like design for your schema, you'd have to apply these role changes anyway; you'd just apply them to SpiceDB, instead of to your database. Likewise, if you have a relationship between say, a document and its parent organization, you do have to write/delete those as well, but that should only occur when the document is created or deleted.
In practice, unless you intend to keep the relationships in both your database and in SpiceDB (which some users do), you'll generally only have to write them to one or the other. If you do intend to apply them to both, you can either just perform the updates to both at the same time, or use an outbox-like pattern to synchronize behind the scenes.
Having to be proactive in your applications about storing data in a centralized system is necessary for data consistency. The alternative is federated systems that reach into other services. Federated systems come with the trade-offs of being eventually consistent and can also suffer from priority inversion. I presented on the centralized vs federate trade-offs in a bit of depth and other design aspects of authorization systems in my presentation on the cloud native authorization landscape.
Caveats are a new feature in SpiceDB that enable dynamic policy to be enforced on the relationship graph. Caveats are defined using Google's Common Expression Language, which a language used for policy in other cloud-native projects like Kubernetes. You can also use caveats to make relationships that eventually expire, if you want to take some of book-keeping out of your app code.
I'm building an ontology in Protégé for data sharing management in IoT environments. For this purpose I want to create a package that contains the observation (raw data), information on its provenance, and the licence that the data consumer will need to accept and respect to be able to use the ressource. The aim of our project is to make this "package" the entity that will be circling between users rather than just the raw data, and therefor for the data owner/producer to not completely lose his ownership once his data is shared.
In order to do this, I created rather spontaneously a class named "Package" composed of three disjointed classes which are: the observation, its provenance information, and the generated licence. However, I realized that this does not mean "a package is composed of those three elements", but rather "each one of those three elements is a package", which is not at all what I'm seeking.
Is there a way to express the composition, without (for example) having to create an Object Property named "isComposedOf" ?
Thank you in advance for your time. Please don't hesitate if you need other details
This seems like a pretty common use-case. Let's say we have sensitive PII that we want to protect, such as SSNs. We mask that data using dynamic data masking in Snowflake. Now we have an engineer that is writing data transformations, and they need to join two tables using SSN. They don't have clearance to view the SSNs, but they can view the other information on both tables. I want the engineer to be able to join the two tables, and see all the combined unsecured data, while keeping the SSN secret from the engineer. I'm really not sure why Snowflake doesn't use real values for joins behind the scenes while refusing to return them in results. Is there a workaround?
One idea is to make the masking policy return a hash of the initial value. That has a couple of limitations. First, it is explicitly warned against in the Snowflake docs. Second, it requires runtime hashing of all the values, which slows down query execution seemingly needlessly. Third, there is the issue of hash collisions which could break joins. This could result in an engineer spending days working to track down a bug in their code, only to realize that the extra rows in their dataset are the result of a hash collision.
Another potential solution is using an external tokenization provider (docs). I don't understand this option well, but it appears that this would mean that I would need to store the actual values and their tokenized form with a third party service, then make an API call each time I wanted to use the values in a query. That seems less than ideal. I'd rather the solution be contained within Snowflake.
I'd love to hear any thoughts, thanks in advance.
If you care about database integrity and avoid errors: Don't use SSNs as identifiers.
A SSN can be a property of a person, but don't use it as their primary key.
As the United States Social Security Administration says:
A 1990 OIG, HHS study indicated that 45% of organizations, both public and private, using SSNs make no effort to verify SSN accuracy. This leads to the real possibility that transfers of data from one organization to another could be inaccurate; computer matching of data between different organizations could be invalid; and innocent persons could be subjected to unwarranted intrusions into their privacy or improper changes in their benefits or services or even misidentified with serious results.
Also:
The SSN is the single most widely used record identifier for both government and the private sector, exerting a broad influence on the lives of most Americans. However, by itself, it is not a personal identifier because it lacks systematic assignment to every person and the means to authenticate a person's identity.
https://www.ssa.gov/history/reports/ssnreportc2.html
Instead you could create a unique id for each person within your database, and use that key for joins.
I am building a master database to store all relevant information about our customers. I am using Neo4j.
Below is a sample of our model. We have Person, that can be registered in 3 of our mobile applications. (App.01, App. 02, App. 03 - We use CPF key, it is like a SSN). In those apps the user can be registered with an email. So it is represented by Email entity. Those user can have multiple address represented by Address entity.
The question is:
As I am building a Master Data, IMO, if someone query the mdm database asking for all "best" information about a person, I would return for example:
Name: John
Best email: email2 (because it has two apps using it)
Best address: addr1 (because it has tow apps using it)
So I am going to build some heuristis to define what is the "best" email and address.
For this purpose, I have some options:
I could create an edge from John to email2 and to addr1. So it's going to be easy for an user of MDM to get the "best" address/email from John.
I could build a rest API endpoint and create this heuristic in query time.
Does anyone have experience using graph database or design MDM database?
Is it a good approach?
This question is a complement for the question: Using Neo4j to build a Master Data Management
The graph data model is good to store your master data, however, your master data most likely will co-exist with operational and reference data in the form of dimensions.
if you decide to go with a graph model for your DMD, make sure that you have a well defined semantic model for the core dimension is MDM, usually:
products
customer
employees
Assets
Location
These core dimensions become attributes of your nodes.
Also, decide what DMD architecture style you are going to adopt, some popular ones are:
The Registry - Graph fits very well with this style because your master data remains in the SOS(system of record) and the references can be represented in the graph very nicely.
Master data Hub - Extra transformations ar4e required to transpose your system of record from tabular to the graph.
Master-Master. - this style fits well with your MDM in the graph if you do not have too many legacy apps that depend on your MDM.
Approach 1 would add a lot of essentially redundant information (about 2N extra relationships, where N is the number of people), and also require more complex coding to handle changes to a person's apps. And, as always when information is stored redundantly, you would have to be especially careful that inconsistencies do not creep in. But, it should be faster when querying for the "best" contact info.
Approach 2 keeps the DB the same size, but requires a more complex and slower query to get the "best" contact info. However, changing a person's apps and contact info is straightforward.
To decide which approach to use, you should consider whether DB size is an issue, and also look at your use cases and how frequently they will be performed.
Here is a simple heuristic if DB size is not an issue. Suppose G is the frequency at which you need to get a person's "best" contact info, and M is the frequency at which you need to modify a person's apps or contact info. You would pick approach 1 if the value of G/M exceeds some threshold value, K, that you would have to decide on, taking into consideration the above considerations.
I have had a request by a client to pull in the Lab Name and CLIA information from several different vendors HL7 feeds. Problem is I am unsure what node I should really pull this information from.
I notice one vendor is using ZPS and it appears they have Lab Name and CLIA there. Although I see that others do not use the ZPS. Just curious what would be the appropriate node to pull these from?
I see the headers nodes look really abbreviated with some of my vendors. I need a perfectly readable name like, 'Johnson Hospital'. Any suggestions on the field you all would use to pull the CLIA and Lab Name?
Welcome to the wild world of HL7. This exact scenario is why interface engines are so prevalent and useful for message exchange in the healthcare industry.
Up until, I believe HL7 v2.5.1, there was no standardization around CLIA identifiers. Assuming you are receiving ORU^R01 message, you may want to look at the segment OBX and field 15, which may have producer or lab identifier. The only thing is that there is a very slim chance that they are using HL7 2.5.1 or are implementing the guidelines as intended. There are a lot of reasons for all of this, but the concept here is that you should be prepared to have to do some work here for each and every integration.
For the data, be prepared to exchange or ask for a technical specification from your trading partner. If that is not a possibility or if they do not have one, you should either ask for a sample export of representative messages from their system or if they maybe have a vendor reference. Since the data that you are looking for is not quite as established as something like an address there is a high likelihood that you will have to get this data from different segments and fields from each trading partner. The ZPS segment that you have in your example, is a good reference. Any segment that starts with Z is a custom segment and was created because the vendor or trading partner could not find a good, existing place to store that data, so they made a new segment to store that data themselves.
For the identifiers, what I would recommend is to create a translation or a mapping table for identifiers. So, if you receive JHOSP or JH123 you can translate/map that to 'Johnson Hospital'. Each EMR or hospital system will have their own way to represent different values and there is no guarantee that they will be consistent, so you must be prepared to handle that scenario.