Functional Decomposition Diagrams and Data Flow Diagrams - system-design

What is the relationship between an FDD and a DFD of the same system?

Functional decomposition is about partitioning the functionality of a big complicated system, into smaller, preferably simpler parts. The FDD is a tool that aids you in this process. Basically, you are breaking down of the capabilities of a complicated system, into a set of more specific logically grouped capabilities.
Now, a data flow diagram deals with how data flows through a system for a specific function of the system. So, each of the above mentioned capabilities might very well have their own unique data flows.
For example, if you have an FDD diagram describing a blogging system. You might have functions for say, displaying a blog post, editing a blog post and potentially sending a link to a blog post to a friend.
These three functions will all have fairly different data flows, which can be modelled separately with DFDs. So, I'd say the relationship between these two types of diagrams are that one can help identify the individual functions, which might, or might not, need to have an associated dataflow mapped.
Hope that is helpful.

Related

Zanzibar doubts about Tuple + Check Api. (authzed/spicedb)

We currently have a home grown authz system in production that uses opa/rego policy engine as core for decision making(close to what netflix done). We been looking at Zanzibar rebac model to replace our opa/policy based decision engine, and AuthZed got our attention. Further looking at AuthZed, we like the idea of defining a schema of "resource + subject" types and their relations (like OOP model). We like the simplicity of using a social-graph between resource & subject to answer questions. But the more we dig-in and think about real usage patterns, we get more questions and missing clarity in some aspects. I put down those thoughts below, hope it's not confusing...
[Doubts/Questions]
[tuple-data] resource data/metadata must be continuously added into the authz-system in the form of tuple data.
e.g. doc{org,owner} must be added as tuple to populate the relation in the decision-graph. assume, i'm a CMS system, am i expected to insert(or update) the authz-engine(tuple) for for every single doc created in my cms system for lifetime?.
resource-owning applications are kept in hook(responsible) for continuous keep-it-current updates.
how about old/stale relation-data(tuples) - authz-engine don't know they are stale or not...app's burnded to tidy it?.
[check-api] - autzh check is answered by graph walking mechanism - [resource--to-->subject] traverse path.
these is no dynamic mixture/nature in decision making - like rego-rule-script to decide based on json payload.
how to do dynamic decision based on json payload?
You're correct about the application being responsible for the authorization data it "owns". If you intend to have a unique role/relationship for each document in your system, then you do need to write/delete those relationships as the referenced resources (or the roles on them, more likely) change, but if you are using an RBAC-like design for your schema, you'd have to apply these role changes anyway; you'd just apply them to SpiceDB, instead of to your database. Likewise, if you have a relationship between say, a document and its parent organization, you do have to write/delete those as well, but that should only occur when the document is created or deleted.
In practice, unless you intend to keep the relationships in both your database and in SpiceDB (which some users do), you'll generally only have to write them to one or the other. If you do intend to apply them to both, you can either just perform the updates to both at the same time, or use an outbox-like pattern to synchronize behind the scenes.
Having to be proactive in your applications about storing data in a centralized system is necessary for data consistency. The alternative is federated systems that reach into other services. Federated systems come with the trade-offs of being eventually consistent and can also suffer from priority inversion. I presented on the centralized vs federate trade-offs in a bit of depth and other design aspects of authorization systems in my presentation on the cloud native authorization landscape.
Caveats are a new feature in SpiceDB that enable dynamic policy to be enforced on the relationship graph. Caveats are defined using Google's Common Expression Language, which a language used for policy in other cloud-native projects like Kubernetes. You can also use caveats to make relationships that eventually expire, if you want to take some of book-keeping out of your app code.

Migrating from Neo4j to Grakn

I'm in the process of migrating a neo4j database into Grakn for genomics and biological data, I have the files in CSV for this but I need to an ETL Tool for solving this problem in the simplest way.
I am following this template Python migrator:
https://blog.grakn.ai/loading-data-and-querying-knowledge-from-a-grakn-knowledge-graph-using-the-python-client-b764a476cda8
Am I correct in thinking this way -
Do nodes map to entities?
Do edges in neo4j map to relationships in Grakn?
Do labels map to attributes?
While it is possible to use a direct mapping of the property-graph model to the entity-relationship model (used by Grakn), it is highly likely that limitations and shortcomings of the property graph model will be transferred. This is why Grakn does not provide or encourage a completely general migration tool. Every Grakn knowledge graph should be powered by a thought-out model (ie. schema) that is tailored to the intended domain.
To outline how one can easily (re)model a dataset in Grakn, the key is to create a schema that closely resembles how we perceive data in the real world in terms of things and their interactions. This easily maps onto the Entity-Relationship-Attribute model Grakn uses. It is common to iterate several times before settling on the final schema (though it can always be extended later).
Then we can:
ask intuitive questions (in the form of Graql queries) - using the defined Entities/Relationships/Attributes that map closely to our mental model
build an intelligent database that is capable of reasoning over data the same way we do, by adding logical, deductive rules that apply in our domain
I encourage to you check out this blog post on the challenges of working with graph databases, and for any domain specific modeling questions head over to the Grakn community forum.
Good luck and welcome to Grakn!
If you map your property graph directly to GRAKN, you will end up with relations that are most likely named as verbs connecting only two objects (one of which will appear to be a subject and the other an object). GRAKN will be fine with this, but as mentioned previously, may make leveraging all the goodness in GRAKN more difficult. In particular, converting existing graph structures to hyperedges may take some significant reengineering. But the good news is that the ETL would be straightforward.
A better solution would be to define your ideal schema first in GRAKN (taking advantage of hyperedges), then fashion an ETL to populate the schema. In such a case, the ETL might be simple or complex. It would depend on how complex your original data was and how complex the new schema was.

Does it make sense to interrogate structured data using NLP?

I know that this question may not be suitable for SO, but please let this question be here for a while. Last time my question was moved to cross-validated, it froze; no more views or feedback.
I came across a question that does not make much sense for me. How IFC models can be interrogated via NLP? Consider IFC models as semantically rich structured data. IFC defines an EXPRESS based entity-relationship model consisting of entities organized into an object-based inheritance hierarchy. Examples of entities include building elements, geometry, and basic constructs.
How could NLP be used for such type of data? I don't see NLP relevant at all.
In general, I would suggest that using NLP techniques to "interrogate" already (quite formally) structured data like EXPRESS would be overkill at best and a time / maintenance sinkhole at worst. In general, the strengths of NLP (human language ambiguity resolution, coreference resolution, text summarization, textual entailment, etc.) are wholly unnecessary when you already have such an unambiguous encoding as this. If anything, you could imagine translating this schema directly into a Prolog application for direct logic queries, etc. (which is quite a different direction than NLP).
I did some searches to try to find the references you may have been referring to. The only item I found was Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques:
... the authors propose a new method for extending the IFC schema to incorporate CC-related information, in an objective and semiautomated manner. The method utilizes semantic natural language processing techniques and machine learning techniques to extract concepts from documents that are related to CC [compliance checking] (e.g., building codes) and match the extracted concepts to concepts in the IFC class hierarchy.
So in this example, at least, the authors are not "interrogating" the IFC schema with NLP, but rather using it to augment existing schemas with additional information extracted from human-readable text. This makes much more sense. If you want to post the actual URL or reference that contains the "NLP interrogation" phrase, I should be able to comment more specifically.
Edit:
The project grant abstract you referenced does not contain much in the way of details, but they have this sentence:
... The information embedded in the parametric 3D model is intended for facility or workplace management using appropriate software. However, this information also has the potential, when combined with IoT sensors and cognitive computing, to be utilised by healthcare professionals in Ambient Assisted Living (AAL) environments. This project will examine how as-constructed BIM models of healthcare facilities can be interrogated via natural language processing to support AAL. ...
I can only speculate on the following reason for possibly using an NLP framework for this purpose:
While BIM models include Industry Foundation Classes (IFCs) and aecXML, there are many dozens of other formats, many of them proprietary. Some are CAD-integrated and others are standalone. Rather than pay for many proprietary licenses (some of these enterprise products are quite expensive), and/or spend the time to develop proper structured query behavior for the various diverse file format specifications (which may not be publicly available in proprietary cases), the authors have chosen a more automated, general solution to extract the content they are looking for (which I assume must be textual or textual tags in nearly all cases). This would almost be akin to a search engine "scraping" websites and looking for key words or phrases and synonyms to them, etc. The upside is they don't have to explicitly code against all the different possible BIM file formats to get good coverage, nor pay out large sums of money. The downside is they open up new issues and considerations that come with NLP, including training, validation, supervision, etc. And NLP will never have the same level of accuracy you could obtain from a true structured query against a known schema.

Other than a class diagram is there another way (with or without UML) to model an MVC web application design?

I'm building a rails application and I'm having trouble working out how create diagrams for the application architecture.
I've created UML class diagrams in the past, so consequently that's where I headed. I've found the railroady gem that generates UML class diagrams via a rake task, however it separates the models from the controllers - which feels fragmented to me.
What I want to know is whether there is another (preferably better) way to model an MVC (rails) web application.
I'm not necessarily looking for a gem to generate the diagram for me, I'm happy to create it manually in visio, I just don't know what type of diagram I should be using.
You may want to try the Robustness diagram, also sometimes called MVC diagram.
See for example here and there.
It is not really a UML diagram, but most UML tools manage it through stereotypes and custom icons. The tool I use, Magicdraw UML, uses a class diagram, but I think I heard of tools that use communication diagrams (not sure, though).
However, it may or may not meet your expectations, as it is a very global diagram.
There exist a metodology named UWA (Ubiquitous Web Application) that allows you to describe not only the data structure, but also the navigation, presentation and transaction models.
The UWA methodology has a user-centered approach, which improves the requirement and design definitions. Since this metodology was developed specifically for modelling web applications, it allows a clear separation of content, navigation, transaction, publishing and operational elements.
UWA begins with a goal-oriented requirements engineering that naturally arise to later design stages, revealing key features that should be implemented. This leads to reasoning about some requirements that might have not been identified beforehand, or may have been underrated.
You may find additional information about UWA here.
Even if you decide not to apply this methodology, it may provide you with some tips about adapting UML diagrams to web applications.
Have you ever come across a Use Case Diagram before? It's not strictly a diagram to outline a systems architecture, but it does provide a good visual representation of communication with other parts of the system/ external actors, during a given "use case" (or process).
For example:
User(Actor) --> Update Status(Use Case)--includes-->(Log in)
Here we have a user wanting to update their status. In order to do this, they need to be logged into the site (an included use case). Thinking about this in MVC mode, we know that "Update Status" and "Log in" would both be controller methods, which would both communicate with the attracted website database (also an actor), thus demonstrating the communication path within a system.
Actors of a system can be anything that communicates with the actual system during a process, usually externally, so users, browsers, database, clients etc.
In terms of modelling the MVC architecture, this is done best by the Class diagram, but a Use Case diagram would also aid in the visual representation.
I always draw up a Use Case and Class Diagram together before I start coding, as a way of extracting the system requirements and laying them out in a working design. UML diagrams are design tools after all- there's not really much point in creating one after you've written the system code!
Just something to think about anyway- hope this helps!
brief overview of basic use-case diagrams

Comparison of OData and Semantic Web/Linked Data

I'm trying to get my head around two very different approaches to data sharing: OData and Semantic Web/Linked Data. Is there a good comparison of the two?
As I understand it, OData combines syndication/CRUD (AtomPub), serialisation formats (XML, JSON), a data model, a query language, and some semantics/conventions governing use of those existing technologies. It's primarily intended for exposing data from one system so that others can consume it.
Linked Data is a data model, a rigorous commitment to URIs, an (optional?) serialisation format (RDF/XML), but (correct me if I'm wrong) doesn't say anything about transport, CRUD, etc. It seems intended to allow inferencing across lots of little chunks of data drawn from a wide variety of sources. (Not something of major importance to us right now - we would be synchronising large slabs of data between a small number of sources, and wanting to preserve provenance information).
I'm interested in technologies for sharing data between certain data management platforms, some of which I work on directly. OData seems more appealing as it's very straightforward to explain to developers: implement this API, follow that Atom standard, serialise the data like this. We're already doing something very similar for one platform: sharing XML-serialised data on an Atom feed, with URL parameters used to filter.
By contrast, my past experiences working with RDF have given me an impression of brittle, opaque (massive slabs of RDF/XML), inaccessible (using SPARQL vs SQL) technology - but perhaps I'm confusing the experience of working with a triplestore like Jena with simply exposing an existing database via a linked data API.
Any pointers, comments etc on the differences and similarities between these two approaches in terms of scope, technologies, ease, future potential etc would be great.
I think discussing this in depth is not really what Stackoverflow is meant for, but just to give you some pointers to interesting discussions about differences and overlap:
Oh - it is data on the Web
Microsoft, OData and RDF
One of the key differences seems to be that OData has no means to link data from different sources to each other. Essentially, you're still stuck in a silo.
It might also be interesting to check out various attempts to convert data between the two approaches. See a.o. http://answers.semanticweb.com/questions/1298/has-anyone-written-a-mapping-from-odata-to-rdf .
OData may be easier, but its not better, by any means. SPARQL and RDF (forget RDF/XML, better to look at Turtle) satisfies everything in OData along with providing many more cutting edge features such as:
Federation Extensions
Linked Data
Reasoning and Inference (for the more brave)
Equally, the software supporting the standards is actually quite sophisticated. Most people interested in OData generally come from a Microsoft background, so take a look at dotNetRdf
Here's a comparison matrix:
http://uoccou.wordpress.com/2011/02/17/linked-data-odata-gdata-datarss-comparison-matrix/
Unfortunately the table formatting is pretty horrible, but the content is useful.

Resources