dealing with state data in an incremental migration from a monolithic legacy app [closed] - system-design

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I have a very large monolithic legacy application that I am tasked with breaking into many context-bounded applications on a different architecture. My management is pushing for the old and new applications to work in tandem until all of the legacy functionality has been migrated to the current architecture.
Unfortunately, as is the case with many monolithic applications, this one maintains a very large set of state data for each user interaction and it must be maintained as the user progresses through the functionality.
My question is what are some ways that I can satisfy a hybrid legacy/non-legacy architecture responsibly so that in the future state all new individual applications are hopelessly dependent on this shared state model?
My initial thought is to write the state data to a cache of some sort that is accessible to both the legacy application and the new applications so that they may work in harmony until the new applications have the infrastructure necessary to operate independently. I'm very skeptical about this approach so I'd love some feedback or new ways of looking at the problem.

Whenever I've dealt with this situation I take the dual writes approach to the data as it mostly a data migration problem. As you split out each piece of functionality you are effectively going to have two data models until the legacy model is completely deprecated. The basic steps for this are:
Once you split out a component start writing the data to both the old and new database.
Backfill the new database with anything you need from the old.
Verify both have the same data.
Change everything that relies on this part of the data to read from the new component/database.
Change everything that relies on this part of the data to write to the new component/database.
Deprecate that data in old database, i.,e. back it up then remove it. This will confirm that you've migrated that chunk.
The advantage is there should no data loss or loss of functionality and you have time to test out each data model you've chosen for a component to see if it works with the application flow. Slicing up a monolith can be tricky deciding where your bounded contexts lie is critical and there's no perfect science to it. Always keep in mind where you need your application to scale and which pieces are required to perform.

Related

What is the benefit of storing data in databases like SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is very elementary question but why does a framework like Rails use ActiveRecord to run SQL commands to get data from a DB? I heard that you can cached data on the Rails server itself, so why not just store all data on the the server instead of the DB? Is it because space on the server is a lot more expensive/valuable than on the DB? If so, why is that? Also can the reason be that you want a ORM in the DB and that just takes too much code to set up on the Rails server? Sorry if this question sounds dumb but I don't know where else I can go for an answer.
What if some other program/person wants to access this data and for some reason cannot use your rails application? What if in future you decide to discontinue using rails and decide to go with some other technology for front end but want to keep the data? In these cases having a separate database helps. Also could you run complex join queries on cached data on Rail Server?
databases hold a substantial amount of advantages against other types of databases. Some of them are listed below:
Data integrity is maximised and data redundancy is minimised, as
the single storing place of all the data also implies that a given
set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data
reliability.
Generally bigger data security, as the single data storage location
implies only a one possible place from which the database can be
attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to
often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a
single database design.
Generally easier data portability and database administration. More
cost effective than other types of database systems as labour, power
supply and maintenance costs are all minimised.
Data kept in the same location is easier to be changed, re-organised,
mirrored, or analysed.
All the information can be accessed at the same time from the same
location.
Updates to any given set of data are immediately received by every
end-user.

iOS Application VIPER Architecture - how many dataManagers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am looking for an answer to this question in the context of the VIPER Architectural pattern -
If you have an application that talks to both a web api and a database how many dataManagers should you have one, two or three?
Case
a) dataManager
b) APIDataManager and LocalDataManager
c) dataManager, APIDataManager and LocalDataManager
Where in
a) The interactor talks to a single dataManager that talks to any services you may have (remote or local).
b) The interactor knows the difference between local and remote information - and calls either the APIDataManager or the LocalDataManager, which talk to remote and local services respectively.
c) The interactor only talks to a general dataManager, the general dataManager then talks to the APIDataManager and LocalDataManager
EDIT
There may be no definitive solution. But any input would be greatly appreciated.
Neither VIPER nor The Clean Architecture dictate that there must be only one data manager for all interactors. The referenced VIPER article uses only one manager just as an example that the specific storage implementation is abstracted away.
The interactor objects implement the application-specific business rules. If what the app does is talk to the server, then turn around and talk to the local disk store, then it’s perfectly normal for an interactor to know about this. Even more, some of the interactors have to manage exactly this.
Don’t forget that the normal object composition rules apply to the interactors as well. For example, you start with one interactor that gets data from the server and saves it to the local store. If it gets too big, you can create two new interactors, one doing the fetching, another one—saving to the local store. Then your original interactor would contain these new ones and delegate all its work to them. If you follow the rules for defining the boundaries, when doing the extract class refactoring, you won’t event have to change the objects that work with the new composite interactor.
Also, note that in general it is suggested not to name objects with manager or controller endings because their roles become not exactly clear. You might name the interface that talks to the server something like APIClient, the one that abstracts your local storage—something like EntityGateway or EntityRepository.
It depends on where the abstraction lies within your app, that is distinguishing what you do from how you do it. Who is defining that there are two different data stores?
If local and remote data stores are part of the problem domain itself (e.g. sometimes the problem requires fetching remote data, and other times it requires fetching local data), it is sensible for the interactor to know about the two different data stores.
If the Interactor only cares about what data is requested, but it does not care about how the data is retrieved, it would make sense for a single data manager to make the determination of which data source to use.
There are two different roles at play here—the business designer, and the data designer. The interactor is responsible for satisfying the needs of the business designer, i.e. the business logic, problem domain, etc. The data layer is responsible for satisfying the needs of the data designer, i.e. the server team, IT team, database team, etc.
Who is likely to change where you look to retrieve data, the business designer, or the data designer? The answer to that question will guide you to which class owns that responsibility.

How to secure Rails app with several companies sharing application and databases [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Environment: Ruby 2.0.0, Rails 4.1, Windows 8.1, Devise, CanCan, Rolify all on PostgreSQL.
I am building an app that will have multiple companies sharing it. Each company will have devise admins that will manage their users. And, each company will have its own data in use. All of this is planned to share tables, isolated by company id within those tables. The app is currently working with user management and no problems. Each admin sees only interacts with only their company's users. I am about to build the MVC for the main application.
I want to take a reality check at this point. How exposed will one company be to another? What exposures will exist and how do I mitigate them? Is there another gem out there that will help me implement this? Or, is this just a really, really bad enough idea that I should isolate each company to its own image?
Properly isolating customers from each other is harder than it seems. Its not just a one time event, you will have to keep it in mind and continue to deal with it as you grow. And data segregation is just one part of the problem. All of your resources, servers, databases, caches, background workers, etc... are contended for by your customers and the actions of one customer can have an impact on your app's performance for others.
Definitely do your research on multi-tenancy techniques, but I would suggest you ultimately settle on wrapping a simple solution in an abstraction that is seamless to the rest of the app. Something like:
for_customer(1) do
# This should return only the models visible to customer 1
# regardless of where they live or however they are partitioned.
MyModel.all
end
For the web case, that code can wrap controller actions via an around filter. Don't worry about implementing it crudely now, thats why you have an abstraction and partitioning code that lives in one place. As things change and you encounter problems and/or deficiencies, improve the implementation and deploy.
I work at a SaaS company with several hundred customers all getting real traffic, and there was no way we could have foreseen all of the issues we'd eventually run into in keeping customers isolated from one another. Things like passenger not correctly clearing memcached connections across process forks at startup during a seamless deployment. Or code that would correctly ensure db connections weren't shared across resque worker process forks suddenly becoming inadequate after an ActiveRecord upgrade.
Don't try to figure everything out now, just make sure this code lives in one spot and that if it changes, its not going to have a cascade effect to the rest of your app. Because you know its going to need to change.

Creating an iCloud like server [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So, iCloud is still under NDA, but my question isn't so much about iCloud as it is how to implement something inspired by iCloud.
We all know iCloud is just a server based mechanism for syncing Documents. I'm just really inspired by the Documents aspect of it. It seems like a different paradigm to me to focus on syncing documents. Web APIs I have written (which isn't many) have all been SQL database driven.
An example:
A simple blog post usually is something like this:
A row in a database that contains the title, content, date published, author.
If you want to update the title, for instance, you update that row/column in the DB. Easy, until you want to sync a bunch of clients who are making offline changes.
But if a blog post were a single document, that within itself maintained its own internal structure with a title, content, etc. all within one file. When you modify the title, the document is updated locally, then push the entire document (or a diff) up to the server. The server just replaces the old document with the new document, and voila the title is updated on the server. Obviously, this can lead to merge conflicts, but those can be handled by sending conflicting documents to the clients.
Anyway, I like that approach and I can see how it could be really useful for many web apps, especially those that want to support modifying data while offline and syncing easily once an internet connection is available, which is while its great for iOS.
My question: is there a name for what I'm talking about, and are there some useful reading materials available for learning how to implement such a technology?
In Other Words (Edit)
The iCloud API (currently under NDA) is just plain cool, and I want to start organizing the data in my iOS apps to be Documents synced with a Document Server, rather than just plain Core Data objects syncing through some REST api. How cool would it be if you could deploy an iCloud like server that was custom tailored for your iOS app?!
IMPO database approach is more practical then dealing with additional files. But this can be done with PHP or C++ functions. I have done something similar, where you edit a file through C++, then put in a folder, the folder can be any folder which will be synced with the server.
After more research, I realized what I was looking for is called a Document-oriented Database, and I have selected Apache CouchDB as a good Document-oriented Database for use with my iPhone app. Its not exactly the same as iCloud, but has many similar features.

To go API or not [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My company has this Huge Database that gets fed with (many) events from multiple sources, for monitoring and reporting purposes. So far, every new dashboard or graphic from the data is a new Rails app with extra tables in the Huge Database and full access to the database contents.
Lately, there has been an idea floating around of having external (as in, not our company but sister companies) clients to our data, and it has been decided we should expose a read-only RESTful API to consult our data.
My point is - should we use an API for our own projects too? Is it overkill to access a RESTful API, even for "local" projects, instead of direct access to the database? I think it would pay off in terms of unifying our team's access to the data - but is it worth the extra round-trip? And can a RESTful API keep up with the demands of running 20 or so queries per second and exposing the results via JSON?
Thanks for any input!
I think there's a lot to be said for consistency. If you're providing an API for your clients, it seems to me that by using the same API you'll understand it better wrt. supporting it for your clients, you'll be testing it regularly (beyond your regression tests), and you're sending a message that it's good enough for you to use, so it should be fine for your clients.
By hiding everything behind the API, you're at liberty to change the database representations and not have to change both API interface code (to present the data via the API) and the database access code in your in-house applications. You'd only change the former.
Finally, such performance questions can really only be addressed by trying it and measuring. Perhaps it's worth knocking together a prototype API system and studying it under load ?
I would definitely go down the API route. This presents an easy to maintain interface to ALL the applications that will talk to your application, including validation etc. Sure you can ensure database integrity with column restrictions and stored procedures, but why maintain that as well?
Don't forget - you can also cache the API calls in the file system, memory, or using memcached (or any other service). Where datasets have not changed (check with updated_at or etags) you can simply return cached versions for tremendous speed improvements. The addition of etags in a recent application I developed saw HTML load time go from 1.6 seconds to 60 ms.
Off topic: An idea I have been toying with is dynamically loading API versions depending on the request. Something like this would give you the ability to dramatically alter the API while maintaining backwards compatibility. Since the different versions are in separate files it would be simple to maintain them separately.
Also if you use the Api internally then you should be able to reduce the amount of code you are having to maintain as you will just be maintaining the API and not the API and your own internal methods for accessing the data.
I've been thinking about the same thing for a project I'm about to start, whether I should build my Rails app from the ground up as a client of the API or not. I agree with the advantages already mentioned here, which I'll recap and add to:
Better API design: You become a user of your API, so it will be a lot more polished when you decided to open it;
Database independence: with reduced coupling, you could later switch from an RDBMS to a Document Store without changing as much;
Comparable performance: Performance can be addressed with HTTP caching (although I'd like to see some numbers comparing both).
On top of that, you also get:
Better testability: your whole business logic is black-box testable with basic HTTP resquest/response. Headless browsers / Selenium become responsible only for application-specific behavior;
Front-end independence: you not only become free to change database representation, you become free to change your whole front-end, from vanilla Rails-with-HTML-and-page-reloads, to sprinkled-Ajax, to full-blown pure javascript (e.g. with GWT), all sharing the same back-end.
Criticism
One problem I originally saw with this approach was that it would make me lose all the amenities and flexibilities that ActiveRecord provides, with associations, named_scopes and all. But using the API through ActveResource brings a lot of the good stuff back, and it seems like you can also have named_scopes. Not sure about associations.
More Criticism, please
We've been all singing the glories of this approach but, even though an answer has already been picked, I'd like to hear from other people what possible problems this approach might bring, and why we shouldn't use it.

Resources