I am working on about 100 APIs in OpenAPIs, and I have about 200 to 300 component schemas, some of them are referred at multiple places which reduces redundancy. But I am confused while making changes to them, because one specific change would reflect in all schemas which is not good. Is there any way on Swagger that I can see where that schema is being referred, so i can avoid undesired changes?
Related
I have an interesting problem that I don't know how to solve.
I have collected a large dataset of 80 million graphs (they are CFG as in Control Flow Graph produced by programs I have analysed from Github) which I need to be able to search efficiently.
I looked into existing solutions like Neo4j but they are all designed to store a global single graph.
In my case this is the opposite all graphs are independent -like rows in a table - but I need to search through all of them efficiently.
For example I want to find all CFGs that has a particular IF condition or a WHILE loop with a particular condition.
What's the best database for this use case?
I don't think that there's a reason not to simply store all those graphs in a single graph, whether it's Neo4j or a different graph database. It's not a problem to have many disparate graphs in a single graph where the disparate graphs are disconnected from one another.
As for searching them efficiently, you would either (1) identify properties in your CFGs that you want to search on and convert them to some indexed value of the graph or (2) introduce some graph structure (additional vertices/edges) between the CFGs that will allow you to do the searches you want via graph traversal.
Depending on what you need to search on approach 1 may not be flexible enough for you especially, if what you intend to search on is not completely known at the time of loading the data. Also, it is important to note that with approach 2 you do not really lose the fact that you have 80 million distinct graphs just because you provided some connection between them. Those physical connections don't change that basic logical fact. You just need to consider those additional connections when you write traversals that you expect to occur only within a single CFG.
I'm not sure what Neo4j supports in this area, but with Apache TinkerPop (an open source graph processing framework that lets you write vendor agnostic code over different graph databases, including Neo4j), you might consider doing some form of graph partitioning to help with approach 2. Or you might subgraph() the larger graph to only contain the CFG and then operate with that purely in memory when querying. Both of these approaches will help you to blind your query to just the individual CFG you want to traverse.
Ultimately, however, I see this issue as a modelling problem. You will just need to make some choices on how to best establish the schema for your use case and virtually any graph database should be able to support that.
I am building an iOS application that will randomly generate sentences (think Mad Libs) where the data used for generation is in multiple tables. This will be used to generate scenarios for training lifeguards. Each table contains an item name, the words that will be used when selected, and different values that determine what can go togeather.
Using two of the 10 tables shown above, the application may pick a location of Deep Water. Then it needs to pick an appropriate activity for in the water, such as Breath holding, but not Running.
I have been looking at Core Data for storage but that seems to be more for data that is changing often by the user and users would never change the data stored. I do want to be able to update the tables myself fairly easily. What would be the optimal solution to do this? The ways I think of are:
Some kind of SQL DB, though my tables again aren't changing and
aren't relationshipable.
2-D arrays written into the source code. Not pretty to work with or read, but my knowledge of regex makes converting from TSV to array fairly easy.
TSV files attached to the project. Better organization itself but take some research on how to access.
Some other method Apple has that I do not know about.
We have to create rather large Ruby on Rails application based on large database. This database is updated daily, each table has about 500 000 records (or more) and this number will grow over time. We will also have to provide proper versioning of all data along with referential integrity. It must be possible for user to move from version to version, which are kind of "snapshots" of main database at different points of time. In addition some portions of data need to be served to other external applications with and API.
Considering large amounts of data we thought of splitting database into pieces:
State of the data at present time
Versioned attributes of each table
Snapshots of the first database at specific, historical points in time
Each of those would have it's own application, creating a service with API to interact with the data. It's needed as we don't want to create multiple applications connecting to multiple databases directly.
The question is: is this the proper approach? If not, what would you suggest?
We've never had any experience with project of this magnitude and we're trying to find the best possible solution. We don't know if this kind of data separation has any sense. If so, how to provide proper communication of different applications with individual services and between services themselves, as this will be also required.
In general the amount of data in the tables should not be your first concern. In PostgreSQL you have a very large number of options to optimize queries against large tables. The larger question has to do with what exactly you are querying, when, and why. Your query loads are always larger concerns than the amount of data. It's one thing to have ten years of financial data amounting to 4M rows. It's something different to have to aggregate those ten years of data to determine what the balance of the checking account is.
In general it sounds to me like you are trying to create a system that will rely on such aggregates. In that case I recommend the following approach, which I call log-aggregate-snapshot. In this, you have essentially three complementary models which work together to provide up-to-date well-performing solution. However the restrictions on this are important to recognize and understand.
Event model. This is append-only, with no updates. In this model inserts occur, and updates to some metadata used for some queries only as absolutely needed. For a financial application this would be the tables representing the journal entries and lines.
The aggregate closing model. This is append-only (though deletes are allowed for purposes of re-opening periods). This provides roll-forward information for specific purposes. Once a closing entry is in, no entries can be made for a closed period. In a financial application, this would represent closing balances. New balances can be calculated by starting at an aggregation point and rolling forward. You can also use partial indexes to make it easier to pull just the data you need.
Auxiliary data model. This consists of smaller tables which do allow updates, inserts, and deletes provided that integrity to the other models is not impinged. In a financial application this might be things like customer or vendor data, employee data, and the like.
I'm building an application where I will be gathering statistics from a game. Essentially, I will be parsing logs where each line is a game event. There are around 50 different kinds of events, but a lot of them are related. Each event has a specific set of values associated with it, and related events share a lot of these attributes. Overall there are around 50 attributes, but any given event only has around 5-10 attributes.
I would like to use Rails for the backend. Most of the queries will be event type related, meaning that I don't especially care about how two event types relate with each other in any given round, as much as I care about data from a single event type across many rounds. What kind of schema should I be building and what kind of database should I be using?
Given a relational database, I have thought of the following:
Have a flat structure, where there are only a couple of tables, but the events table has as many columns as there are overall event attributes. This would result in a lot of nulls in every row, but it would let me easily access what I need.
Have a table for each event type, among other things. This would let me save space and improve performance, but it seems excessive to have that many tables given that events aren't really seperate 'ideas'.
Group related events together, minimizing both the numbers of tables and number of attributes per table. The problem then becomes the grouping. It is far from clear cut, and it could take a long time to properly establish event supertypes. Also, it doesn't completely solve the problem of there being a fair amount of nils.
It was also suggested that I look into using a NoSQL database, such as MongoDB. It seems very applicable in this case, but I've never used a non-relational database before. It seems like I would still need a lot of different models, even though I wouldn't have tables for each one.
Any ideas?
This feels like a great use case for MongoDB and a very awkward fit for a relational database.
The types of queries you would be making against this data is very key to best schema design but imagine that your documents (in a single collection similar to 1. above) look something like this:
{ "round" : 1,
"eventType": "et1",
"attributeName": "attributeValue",
...
}
You can easily query by round, by eventType, getting back all attributes or just a specified subset, etc.
You don't have to know up front how many attributes you might have, which ones belong with which event types, or even how many event types you have. As you build your prototype/application you will be able to evolve your model as needed.
There is a very large active community of Rails/MongoDB folks and there's a good chance that you can find a lot of developers you can ask questions and a lot of code you can look at as examples.
I would encourage you to try it out, and see if it feels like a good fit. I was going to add some links to help you get started but there are too many of them to choose from!
Since you might have a question about whether to use an object mapper or not so here's a good answer to that.
A good write-up of dealing with dynamic attributes with Ruby and MongoDB is here.
I believe several of us have already worked on a project where not only the UI, but also data has to be supported in different languages. Such as - being able to provide and store a translation for what I'm writing here, for instance.
What's more, I also believe several of us have some time-triggered events (such as when expiring membership access) where user location should be taken into account to calculate, like, midnight according to the right time-zone.
Finally there's also the need to support Right to Left user interfaces accoring to certain languages and the use of diferent encodings when reading submitted data files (parsing text and excel data, for instance)
Currently I'm storing all my translations for all my entities on a single table (not so pratical as it is very hard to find yourself when doing sql queries to look into a problem), setting UI translations mainly on satellite assemblies and not supporting neither time zones nor right to left design.
What are your experiences when dealing with these challenges?
[Edit]
I assume most people think that this level of multiculture requirement is just like building a huge project. As a matter of fact if you tihnk about an online survey where:
Answers will collected only until
midnight
Questionnaire definition and part of
the answers come from a text file
(in any language) as well as
translations
Questions and response options must
be displayed in several languages,
according to who is accessing it
Reports also have to be shown and
generated in several different
languages
As one can see, we do not have to go too far in an application to have this kind of requirements.
[Edit2]
Just found out my question is a duplicate
i18n in your projects
The first answer (when ordering by vote) is so compreheensive I have to get at least a part of it implemented someday.
Be very very cautious. From what you say about the i18n features you're trying to implement, I wonder if you're over-reaching.
Notice that the big boy (e.g. eBay, amazon.com, yahoo, bbc) web applications actually deliver separate apps in each language they want to support. Each of these web applications do consume a common core set of services. Don't be surprised if the business needs of two different countries that even speak the same language (e.g. UK & US) are different enough that you do need a separate app for each.
On the other hand, you might need to become like the next amazon.com. It's difficult to deliver a successful web application in one language, much less many. You should not be afraid to favor one user population (say, your Asian-language speakers) over others if this makes sense for your web app's business needs.
Go slow.
Think everything through, then really think about what you're doing again. Bear in mind that the more you add (like Right to Left) the longer your QA cycle will be.
The primary piece to your puzzle will be extensive use of interfaces on the code side, and either one data source that gets passed through a translator to whichever languages need to be supported, or separate data sources for each language.
The time issues can be handled by the interfaces, because presumably you will want things to function in the same fashion, but differ in the implementation details. To a large extent, a similar thought process can be applied to the creation of the interface when adjusting it to support differing languages. When you get down to it, skinning is exactly this, where the content being skinned is the interface, and the look/feel is the implementation.
Do what your users need. For instance, most programmer understand English, there is no sense to translate posts on this site. If many of your users need a translation, add a new table column with the language id, and another column to link a translated row to its original. If your target auditory contains the users from the Middle East, implement Right to Left. If time precision is critical up to an hour, add a time zone column to the user table, and so on.
If you're on *NIX, use gettext. Most languages I've used have some level of support; PHP's is pretty good, for instance.
I'll describe what has been done in my project (it wasn't my original architecture but I liked it anyways)
Providing Translation Support
Text which needs to be translated have been divided into three different categories:
Error text: Like errors which happen deep in the application business layer
UI Text: Text which is shown in the User interface (labels, buttons, grid titles, menus)
User-defined Text: text which needs to be translatable according to the final user's preferences (that is - the user creates a question in a survey and he can also create a translated version of that survey)
For each different cathegory the schema used to provide translation service is different - so that we have:
Error Text: A library with static functions which access resource files
UI Text: A "Helper" class which, linked to the view engine, provides translations from remote assemblies
User-defined Text: A table in the database which provides translations (according to typeID of the translated entity and object id) and is linked to the entity via a 1 x N relationship
I haven't, however, attacked the other obvious problems such as dealing with time zones, different layouts and picture translation (if this is really necessary). Does anyone have tackled this problem in a different way?
Has anyone ever tackled the other i18n problems?