Data Layer Convention - swagger

I am currectly defining a data layer definition/convention that is to be used at a large oranisation.
So every time someone is defining new tags, collect some sort of information from a web page, should follow the convention.
It covers variable naming, values, type description and when to use.
The convention is later to be used with GTM/Tealium iQ but it should be tool agnostic.
What is the best way, from a technical perspective, to define the data layer schema? I am thinking if swagger of json-schema. Any thoughts?

It's important to define your data layer in a way in which works for your organisation. That said, the best data layers have an easy to understand naming convention, are generally not nested and they contain good quality data.
A good tag manager will be able to read your data layer in whatever format you would like, whether this is out of the box or a converter which runs before tag execution.
Here is Tealium's best practice:
https://community.tealiumiq.com/t5/Data-Layer/Data-Layer-Best-Practices/ta-p/15987

Related

How store business logic into database?

I'd like to allow users to define simple business logic such as:
if (x and y) then ...
if (z or w) then ...
Let me put it concretely:
I'm developing a HR module that answers if applicants fulfill some requirements, to be defined by users.
Such requirements can be defined around logical operators:
(Must be 18 years old) or (under 18 years and must have parents permission)
Is putting this kind of logic inside the database ok? I think it is, but I'm afraid of spending time on this and find that its a poor approach.
It's ok. It is flexible approach, although time consuming in development.
Furthermore, you don't have to create your own DSL, it's already done, e.g. json-logic-ruby allows to keep complex rules in json.
As so often, the answer is "it depends" ;) Since it seems that the logic in this case is user-defined data, putting it into the database is absolutely reasonable.
But if you are looking to model the structure/AST of this input into separate business objects with their and and or control flow reflected in the database records, I would have to say that it's very likely overkill and will - apart from the initial implementation overhead - make future refactoring very hard.
A simple text field that will be evaluated at runtime is the easiest way to go as its contents can be very easily extracted and reasoned about.
Not knowing your definitive requirements, I'd suggest you take a look at Drools, a rules engine for Java, which has in its ecosystem also a rule storage backend and guided editor. Incidentally the example in your question looks a lot like it might benefit from a rules engine but unfortunately I don't have any practical experience with any of the related Ruby libraries.
Otherwise this article on the thougtbot blog - Writing a Domain Specific Language in Ruby - might be helpful in this context, too.
I definitely think it's okay. Because the user is defining the business logic or rules, I'd recommended splitting the business logic form field into parts(rule: if/unless, operand1: user.age, operand2: permissions.parental operator1: and, operator2: greater_than...) and then storing each business logic object as a row in a serialized JSON column. This should make them easier to validate and less error prone as compared to a single text field where the user to enter in whatever they like.
I would suggest creating a simple table to store the logic if it is predictable.
For example:
Table: business_logics
Attributes:
opt_1: decimal
opt_2: decimal
logic_opt: integer (enum: and|or)
then_statement: string
So this is extendable when you have more logic_opt in someday, btw you can get the advantage in validation & refactoring later on! Allow users to input the free texts is so risky in your case!

Does the coder we select significantly affect performance?

I'm having trouble understanding the purpose of "coders". My understanding is that we choose coders in order to "teach" dataflow how a particular object should be encoded in byte format and how equality and hash code should be evaluated.
By default, and perhaps by mistake, I tend to put the words " implement serializable" on almost all my custom classes. This has the advantage the dataflow tends not to complain. However, because some of these classes are huge objects, I'm wondering if the performance suffers, and instead I should implement a custom coder in which I specify exactly which one or two fields can be used to determine equality and hash code etc. Does this make sense? Put another way, does creating a custom coder (which may only use one or two small primitive fields) instead of the default serial coder improve performance for very large classes?
Java serialization is very slow compared to other forms of encoding, and can definitely cause performance problems. However, only serializing part of your object means that the rest of the object will be dropped when it is sent between processes.
Much better that using Serializable, and pretty much just as easy, you can use AvroCoder by annotation your classes with
#DefaultCoder(AvroCoder.class)
This will automatically deduce an Avro schema from your class. Note that this does not work for generic types, so you'll likely want to use a custom coder in that case.

Map bpmn to wsdl

My task is to take a bpmn 2.0 xml file and map it as good as possible (with a certain error rate) to available web services. For example when my bpmn file explains the process of buying a pizza, i give 10€ and get back 1 pizza. Now it should map that bpmn to the webservice that needs an of type int with the name "money" etc.
How is that even possible? I searched for a few hours now and came up with the following:
I found https://github.com/camunda/camunda-bpm-platform and can easily use it to parse a plain .bpmn file to a java object structure which i can then query. Easy.
After parsing the xml notation i should analyze it and search for elements that input data and elements that output data for this are the only things i can map to wsdl (wsdl only describes the structure of the webservice: names of variables, types of variables, number of variables). Problem: I do not find any 1:1 elements i can easily declare as "when this bpmn element is used, it 100% means that the process is getting some input named x". What should i do here? What can i map?
I found ws-bpel. As far as i understand i can somehow transfer bpmn to ws-bpel which should be better modeling of the process and more easily be mappable to a wsdl (?). Camunda however doesn't offer this functionality and i am restricted to open source software.
Any suggestions what i should do?

Is it possible to write a F# type provider to linked data?

I really like the Freebase and World Bank type providers and I would like to learn more about type providers by writing one on my own. The European Union has an open data program where you can access data through SPARQL/Linked data. Would it be possible to wrap data access to open EU data by means of a type provider or will it be a waste of time trying to figure out how to do it?
Access to EU data is described here: http://open-data.europa.eu/en/linked-data
I think it is certainly possible - I talked with some people who are actually interested in this (and are working on this, but I'm not sure what is the current status). Anyway - I definitely think this is such a broad area that an additional effort would not be a waste of time.
The key problem with writing a type provider for RDF-like data is to decide what to treat as types (what should become a name of a type or a property name) and what should be left as value (returned as a list or key-value pairs). This is quite obvious for WorldBank - names of countries & properties become types (property names) and values become data. But for triple based data set, this is less obvious.
So far, I think there are two approaches:
Additional ontology - require that the data source comes with some additional ontology that specifies what are the keys for navigation. There is something called "facet ontology" which is used on http://mspace.fm and that might be quite interesting.
Parameterization - parameterize the type provider (in some way) and give it a list of relations that should become available at the type level (and you would probably also need to provide some root where to start).
There are definitely other possibilities - and I think having provider for linked data would be really interesting. If you wanted to do this for F# Data, there is a useful page on contributing :-).

How to design a set of file readers and writers for different format

Digging into a legacy project (C++) that needs to be extended, I realized that there are about 40 reader/writer/parser classes. They are used to read and write various type of data (different objects) in different files format (binary, hdf5, xml, text, ...) ; one type of object is typically bound to one or two file formats. The classes have for most of them just no knowledge of the others. Interfaces and inheritance were apparently unknown to the writer, as well as design patterns.
It seems to me an horrendous mess. On the other hand I am not exactly sure how to handle this situation. I will at least extract interfaces. I would also like to see if I can have common code in some parent classes, for example what is specific to a hdf5 reader/writer. I also thought that the abstract factory pattern could help but the object I get out of the readers are completely different.
How would you handle this situation ? how would you design the classes ? what design pattern would you use if any ? Would you keep the reading and writing parts splitted ?
The Abstract Factory pattern isn't the right track. You usually only need interfaces if you anticipate multiple implementations for a given file type and want both to operate the same way.
Question: can one class be written to multiple file types? As in, object 'a' (of type Class A) potentially needs to be written to either/both XML or text formats?
If that is true, you need to decouple the classes from the readers/writers. Take a look at this question: What design pattern should I use for import/export?

Resources