Making a connection between multiple databases - connection

I'm using JAVA DB (derby)
I want to import a public view of my data to another database (also in java db).
I want to pass this data and save in to the other database. I'm having trouble since the general rule is one connection to one database.
Help would be much appreciated.

You need two connections, one to each database.
If you want the two operations to be a single unit of work, you should use XA JDBC drivers so you can do two-phase commit. You'll also need a JTA transaction manager.
This is easy to do with Spring.
SELECT from one connection; INSERT into the other. Just standard JDBC is what I'm thinking. You'll want to batch your INSERTs and checkpoint them if you have a lot of rows so you don't build up a huge rollback segment.
I'd wonder why you have to duplicate data this way. "Don't Repeat Yourself" would be a good argument against it. Why do you think you need it in two places like this?

Related

Give users read-only access to Neo4j while doing Batch Update

This is just a general question, not too technical. We have this use-case wherein we are to load hundreds of thousands of records to an existing Neo4j database. Now, we cannot afford to make the database offline because of users who are accessing it. I know that Neo4j requires exclusive lock on the database while it's performing batch updates. Is there a way around my problem? I don't want to lock my database while doing updates. I still want my users to access it - even for just read-only access. Thanks.
Neo4j never requires exclusive lock on the database. It selectively locks portions of the graph that are affected by mutating operations. So there are some things you can do to achieve your goal. Are you a Neo4j Enterprise customer?
Option 1: If so, you can run your batch insert on the master node and route users to slaves for reading.
Option 2: Alternatively, you could do a "blue-green" style deployment where you:
take a backup (B) of your existing database (A), then mark the A database read-only
apply your batch inserts onto B either by starting a separate instance, or even better, using BatchInserters. That way, you'll insert your hundreds of thousands in a few seconds
start the new database B
flip a switch on a load-balancer, so that users start to be routed to the B instead of A
take A down
(Please let me know if you need some tips how to make a read-only DB.)
Option 3: If you can only afford to run one instance at any one time, then there are techniques you can employ to let your users access the database as usual and still insert large volumes of data. One of them could be using a single-threaded "writer" with a queue that batches write operations. Because one thread only ever writes to the database, you never run into deadlock scenarios and people can happily read from the database. For option 3, I suggest using GraphAware Writer.
I've assumed you are not trying to insert hundreds of thousands of nodes to a running Neo4j database using Cypher. If you are, I would start there and change it to use Java APIs or the BatchInserter API.

Does using multiple databases in Rails 3 affect performance?

I'm currently with a client using a heroku database with a row limit and we have a big table that logs activities, so after trying some solutions I found out that I can use a different connection for a single model.
Will it affect performance by a lot? I think I'm going to create a micro instance with postgres with just this table.
No, having more than one database connection shouldn't affect performance. It might use slightly more memory per worker process, but probably not by enough to be a concern.
The way I set up a second database connection, was to use another stanza in database.yml. In addition to development and production sections, there can be an 'other_db' connection that you reference in the model that uses it. If you wind up having several models use the other connection, you may want to create one 'superclass' model that inherits from ActiveRecord::Base and just has the establish_connection line, and have the actual models inherit from that one. That way you don't keep repeating the establish_connection line.
Depending on your other requirements this seems like it might be a good use for Heroku's Postgres "follow" feature which creates a read-only instance that shadows your primary DB. More info here: Heroku Postgres Follow
I think having multiple databases is a good idea if you are dealing with a lot of data and a multiple tenants for an application.I use a different section in my database.yml.So using establish connection i will point to this section.So these particular models will point to this section.We can avoid data repetition also.
Consider we have multiple tenants and each tenant have a different database.But each tenant have some common data which it consumes.So for these models we can use a different database.So that all tenants will fetch data from this database.Hence we can avoid data repetition also.

Migrate Data from Neo4j to SQL

Hi I am using neo4j in my application and my structure is as following:
I am using Embedded Graph API
I have several databases that I point to using a pool that I maintain in my application eg-> db1, db2, db3, ..... db100
When I want to access a particular database I point to it using new EmbeddedGraphDatabase("Path to db(n)")
The problem is that when the connection pool count increases the RAM size being consumed by the application keep increasing and breaks down the application at a point of limit.
So I am Thinking of migrating from Neo4j to some other Database.
Additionally only a small part of my database is utilizing the graph structure.
One way for migration is that I write a script for it. Is there any better option?
My another question is what is the best Database so that my structure can be maintained.
Other view-point that I am thinking about is I can keep part of my data into Neo4j and shift another part to some other database.
If anything is unclear I can clarify.
Thanks in advance.
An EmbeddedGraphDatabase instance is not the equivalent of a "connection" in SQL. It's designed to run a long time (days, months). Hence starting/stopping is costly.
What is the use case for having hundreds of separate databases in the same JVM?
Your lots of small databases will perform poorly as the graphdb is designed to hold the whole datamodel on a single host.
Do you run a single JVM per database?
You can control the amount of memory used by neo4j by providing the correct properties for memory mapping and also use the gcr cache from neo4j-enterprise and control the cache size-property variables.
I think it still makes sense to keep the graph part in Neo4j and only move the non-graphy part.

multiple db connections vs. centralized/redundant db

I have a project to create a dashboard that will connect to existing systems as well as create new features based on combining data from the existing systems. For example, the dashboard will be able to generate "orders" containing data merged from "members" (MS Access DB), "employees" (MySQL DB) and "products" (flat file), and there will also be new attributes particular to "orders."
At first I thought it would be most efficient to have my application connect to each of the systems separately and perform cross-vendor joins between the different databases. But then I thought that creating a centralized/redundant db (built with scripts pushing and pulling data between the systems) might also be useful because it would empower some semi-technical staff to use products like OOBase, which can only make a single connection.
Are there any other advantages to creating a centralized/redundant DB like the one I'm talking about? Or are multiple direct connections the best approach?
Thanks in advance for any tips.
To give you are short answer: yes, you want a central data storage.
You don't want to run complex reports on your live database. As your live database will grow you will want to do some housekeeping and clean it up but keep the data for analysi.
You will also want the data to be aggregated so you could perform historical analysis.
For the data which comes from different sources some clean-up will be required. And you will probably need to know how to link your data together and there are quire a lot of things like that you will have to be aware of to do the job properly.
You might consider reading on data warehousing (wikipedia) and business intelligence (wikipedia).
If you want to have 'new features' added to this system you could also look up orchestration (wikipedia. It will allow you to link your heterogeneous business processes together.
All of these are quite specialized and complex disciplines on their own so you might want to have a specialist to consult you.
Be very, very careful to copy lots of data around. If you do, here are some important guidelines:
Make sure that one system is defined as the master and no other system may tamper with the data.
Always copy data from the master to the slaves.
When you copy the data, use a checksum of some kind to make sure all data has been copied. Make sure you can handle "yesterday, the copy failed".
If a slave must make a change, push the change to the master and then use the standard "update" path to merge it back to the slave. Avoid "save change on slave and update the master some time in the future".

Using multithreading for making queries in Delphi

I've been recently applying threads for making queries to a MYSQL database, I use MyDAC for connection to DB, 'cause TMyConnection doesnot let making simultaneously queries per a connection, I create a new connection and a new query object per every thread executing a query, so in certain time could happens that server has several connections per a client. If we consider this scenario for several clients connecting to database, this is would be a problem, I guess. Is there a better solution for using threads in queries?
Thanks in advance
Use a second tier where you can pool some connections (you can do with datasnap or remobjetcs...) This way you can reuse connections of all of your users and mantain the number of connections in a smaller level.
Have a look Cary Jansen article called
Using Semaphores in Delphi, Part 2: The Connection Pool
He goes in to great detail about how to provide thread-safe access to a limited number of database connections
Getting is code to work with MyDac - TMyConnection is trivial.

Resources