Is it a good idea to continuously use External Data Access (EDA) for synchronization of big files (let's say with 10 million records) with RDBMS. Will EDA also handle incremental updates to the source UniData file and automatically reflect those changes(CREATE, UPDATE and DELETE) to the target RDBMS system?
Also, according to the documentation, EDA supports right now MSSQL, Oracle and DB2. Is it possible to configure EDA to work for example with PostgreSQL?
Related
What's difference between Apache Ignite and IWA (Informix Warehouse Accelerator) and Infinispan ?
I have an application that accept large volume of data and process many transaction in per second. Not only
response time is very important for us, but also Data integrity is very important for us, Which in-memory databases best solutions for me ? , I'm confused
to select them. also i use JEE and application server is jboss.
We are looking for best in-memory database solution to processing data in real time?
Update:
I use relational database , i am looking for in-memory database to select , insert , update from that for decrease response time, Also Data integrity is very important and very important persist data on disk
Apache Ignite and Infinispan are both data grids / memory-centric databases with similar feature lists, with the biggest difference that Apache Ignite has SQL support for querying data.
Informix Warehouse Accelerator seems to be a narrow use case product, so it's hard to say if it's useful for your use case or not.
Otherwise, there's too few information in your question about the specifics of your project to say if either of those are a good fit, or even none of them.
I am setting greenplum for the first time. I am following the documentation. I want to setup connection from sql to greenplum database. Currently figuring out what's the best way to achieve this. I came across gpfdist and gpload.
How are the two different? Since both use external tables, both work on slaved nodes and are used for parallel loading. So Is there any advantage of using one over other?
Answering to your question for " I want to setup connection from sql to greenplum database"...
It's ambiguous for which SQL database you are referring to.
Also, there is no direct connectivity drivers available to connect non-greenplum database to greenplum database.
However if you want to migrate data from Oracle to Greenplum, then you can use Informatica's fastclone tool.
To answer your second part of question regarding gpfdist and gpload. GPFDIST is a file distributed process which runs on host system and it serves file parallely to many segments. While initialising external table to read/ write from file, you will need to specify which process will serve the file, In your case it will be GPFDIST. There are other processes too like FTP, GPHDFS, HTTP.
GPLOAD is a wrapper utility which makes your work easier by automatically creating gpfdist processes and external tables.
Also be aware that GPLOAD can only create readable external tables.
gpfdist n gpload or same. In gpfdist you do it manually while in gpload you can automate the activities via maiking entries in config(yaml file) file.
GPLOAD is a wrapper around GPFDIST. so when you load data via gpload it will internally use gpfdist only.
If you want to load/ migrate data from any other RDBMS to Greenplum and you are using any ETL or migration tool, it will use normal copy command and while loading/migrating if you enable gpload(now a days in the latest version of most of the ETL tool and migration tool support gpload feature when you migrate/load data to Greenplum) it will load data in parallel fashion via using gpfdist internally.
I have streaming data coming into my consumer app that I ultimately want to show up in Hive/Impala. One way would be to use Hive based APIs to insert the updates in batches to the Hive Table.
The alternate approach is to write the data directly into HDFS as a avro/parquet file and let hive detect the new data and suck it in.
I tried both approaches in my dev environment and the 'only' drawback I noticed was high latency writing to hive and/or failure conditions I need to account for in my code.
Is there an architectural design pattern/best practices to follow?
What is the difference between ADOTable and ClientDataSet?
Both components are capable of performing Batch Update, why add the extra overhead of having 2 additional components like ClientDataSet and DataSetProvider.
The main difference is that ClientDataSet can operate without a connection to external database. You can use it as in-memory table or load it's contents from file.
In combination with DataSetProvider it is frequently used to overcome limits of unidirectional datasets and as a cache.
A ClientDataSet is an in-memory dataset, which has a lot of usefull additional functionallities.
One big advantage compared to Interbase/Firebird tables and queries is, that you don't need to keep a transaction alive, e.g. as long as you display the data in a grid.
Have a look at this article:
A ClientDataSet in Every Database Application
Client dataset is a generic implementation that works regardless of the underlying db access library. It can work (through the provider) with any TCustomDataset descendant, be it a dbExpress dataset, a BDE one, an ADO one, or any of the many libraries available for Delphi to allow for direct database access using the native client (i.e. ODAC, Direct Oracle Access, ecc. ecc.)
It can also work in a multi-tier mode where the data access dataset and provider are in a remote server application and the TClientDataset is in the client application, allowing for "thin client" deployment which doesn't require database clients or data access library like ADO installed on the client (the required midas.dll code can be linked to the application when using recent versions of Delphi, anyway only the midas.dll is required otherwise).
On top of that it can be used as an in-memory table able to store data in a local file. It allows for the "briefcase" model also, where a thin client can still work when not connected to the database, and then "sync" when a connection becomes available. That's was more useful in the past, when wireless access was not common.
As you can see, TClientDataset offers a lot more of a TADODataset.
The most important difference I can think of is resolving update conflicts. In fact, TClientDataSet exposes the handy ReconcileErrorForm dialog, which wraps up the process of showing the user the old and new records and allows them to specify what action to take, while with TADOTable for instance, you're basically on your own.
In the application I am designing, I have to communicate with a device and store a history of data readings in a database. The device is essentially a sensor that spits out numbers via the serial port. The user end of the application is a RubyOnRails interface that allows the user to view this data and configure the device.
I am wondering what kind of connection between the database and the device you could recommend for this kind of a setup.
Up to this point, I had a custom application running on a host computer (a computer with the device connected directly through a serial port) that would serve as a bridge to a MySQL database. The application would connect directly to the MySQL database and execute queries. It works fairly well, but I am not sure if this is the best solution.
The only other alternative I see is to have an intermediate application that my custom application could connect to, instead of directly going to the database. This could be a part of the main application, or something separate. Would this be a better solution?
Would you recommend another approach?
Thank you,
I have a similar structure, although I fetch my data from a Web Service. The way I organize is:
Create classes in lib/imports, eg DailyDataImport, DailyDataSummarize (you can organize the hierarchy and names as per your wish or willingness).
Create a rake task under a new namespace, say import and add it to your cron job depending frequency. Take a look at Cron in Ruby. Its helpful.
This allows me to have a better control over what goes in my database.
Some questions to consider:
What schedule does the Device follow
to populate the data?
Do you need the data as-is or you
want a little control over it or you
need to process it, like summarizing
and aggregating etc.
MS SQL Server 2008 has great data synchronisation support.
SQL Server 2008 Express is free and can act as a replication subscriber (but not publisher) for clients.
Microsoft Sync Framework