What's difference between Apache Ignite and IWA (Informix Warehouse Accelerator) and Infinispan ?
I have an application that accept large volume of data and process many transaction in per second. Not only
response time is very important for us, but also Data integrity is very important for us, Which in-memory databases best solutions for me ? , I'm confused
to select them. also i use JEE and application server is jboss.
We are looking for best in-memory database solution to processing data in real time?
Update:
I use relational database , i am looking for in-memory database to select , insert , update from that for decrease response time, Also Data integrity is very important and very important persist data on disk
Apache Ignite and Infinispan are both data grids / memory-centric databases with similar feature lists, with the biggest difference that Apache Ignite has SQL support for querying data.
Informix Warehouse Accelerator seems to be a narrow use case product, so it's hard to say if it's useful for your use case or not.
Otherwise, there's too few information in your question about the specifics of your project to say if either of those are a good fit, or even none of them.
Related
Is it a good idea to continuously use External Data Access (EDA) for synchronization of big files (let's say with 10 million records) with RDBMS. Will EDA also handle incremental updates to the source UniData file and automatically reflect those changes(CREATE, UPDATE and DELETE) to the target RDBMS system?
Also, according to the documentation, EDA supports right now MSSQL, Oracle and DB2. Is it possible to configure EDA to work for example with PostgreSQL?
While creating some basic workflow using KNIME and PSQL I have encountered problems with selecting proper node for fetching data from db.
In node repo we can find at least:
PostgreSQL Connector
Database Reader
Database Connector
Actually, we can do the same using 2) alone or connecting either 1) or 2) to node 3) input.
I assumed there are some hidden advantages like improved performance with complex queries or better overall stability but on the other hand we are using exactly the same database driver, anyway..
There is a big difference between the Connector Nodes and the Reader Node.
The Database Reader, reads data into KNIME, the data is then on the machine running the workflow. This can be a bad idea for big tables.
The Connector nodes do not. The data remains where it is (usually on a remote machine in your cluster). You can then connect Database nodes to the connector nodes. All data manipulation will then happen within the database, no data is loaded to your machine (unless you use the output port preview).
For the difference of the other two:
The PostgresSQL Connector is just a special case of the Database Connector, that has pre-set configuration. However you can make the same configuration with the Database Connector, which allows you to choose more detailed options for non standard databases.
One advantage of using 1 or 2 is that you only need to enter connection details once for a database in a workflow, and can then use multiple reader or writer nodes. I'm not sure if there is a performance benefit.
1 offers simpler connection details with the bundled postgres jdbc drivers than 2
I'm researching graph databases for a work project. Since our data is highly connected, it appears that a graph database would be a good option for us.
One of the first graph DB options I've run into is neo4j, and for the most part, I like it. However, I have one question about neo4j to which I cannot find the answer: Can I get neo4j to store the entire graph in-memory? If so, how does one configure this?
The application I'm designing needs to be lightning-fast. I can't afford to wait for the db to go to disk to retrieve the data I'm searching for. I need the entire DB to be held in-memory to reduce the query time.
Is there a way to hold the entire neo4j DB in-memory?
Thanks!
Further to Bruno Peres' answer, if you want to run a regular server instance, Neo4j will load the entire graph into memory when resources are sufficient. This does indeed improve performance.
The Manual has a chapter on configuring memory.
The page cache portion holds graph data and indexes - this is configured via the dbms.memory.pagecache.size property in neo4j.conf. If it is large enough, the whole graph will be stored in memory.
The heap space portion is for query execution, state management, etc. This is set via the dbms.memory.heap.initial_size and
dbms.memory.heap.max_size properties. Generally these two properties should be set to the same value, so that the whole heap is allocated on startup.
If the sole purpose of the server is to run Neo4j, you can allocate most of the memory to the heap and page cache, leaving enough left over for operating system tasks.
Holding Very Large Graphs In Memory
At Graph Connect in San Francisco, 2016, Neo4j's CTO, Jim Webber, in his typical entertaining fashion, gave details on servers that have a very large amount of high performance memory - capable of holding an entire large graph in memory. He seemed suitably impressed by them. I forget the name of the machines, but if you're interested, the video archive should have details.
Neo4j isn't designed to hold the entire graph in main memory. This leaves you with a couple of options. You can either play around with the config parameters (as Jasper Blues already explained in more details) OR you can configure Neo4j to use RAMDisk.
The first option probably won't give you the best performance as only the cache is held in memory.
The challenge with the second approach is that everything is in-memory which means that the system isn't durable and the writes are inefficient.
You can take a look at Memgraph (DISCLAIMER: I'm the co-founder and CTO). Memgraph is a high-performance, in-memory transactional graph database and it's openCypher and Bolt compatible. The data is first stored in main memory before being written to disk. In other words, you can choose to make a tradeoff between write speed and safety.
I have streaming data coming into my consumer app that I ultimately want to show up in Hive/Impala. One way would be to use Hive based APIs to insert the updates in batches to the Hive Table.
The alternate approach is to write the data directly into HDFS as a avro/parquet file and let hive detect the new data and suck it in.
I tried both approaches in my dev environment and the 'only' drawback I noticed was high latency writing to hive and/or failure conditions I need to account for in my code.
Is there an architectural design pattern/best practices to follow?
Question: Does Informix have a construct equivalent to Oracle's "materialized view" or is there a better way to synchronize two tables (not DB's) accross a DB link?
I could write a sync myself (was asked to) but that seems like re-inventing the wheel.
Background: Recently we had to split (one part of DB one one server, the other part on the other server) a monolithic Informix 9.30 DB (Valent's MPM) since the combination of AppServer and DB server couldn't handle the load anymore.
In doing this we had to split a user defined table space (KPI Repository) aranged in a star shema of huge fact tables and well defined dimension tables.
Unfortunately a telco manager decided to centralize the dimension tables (Normalization, no data redundancy, no coding needed) on one machine and thus make them available as views over a DB-link on the other machine. This is both slow and unstable, as it every now and then crashes the DB server if the view is used in sub-queries (demonstrable), very uncool on a producton server
I may be getting your requirements but could you not just use enterprise replication to replicate the single table across the DB's?
IDS 9.30 is archaic (four main releases off current). Ideally, it should not still be in service; you should be planning to upgrade to IDS 11.50.
As MrWiggles states, you should be looking at Enterprise Replication (ER); it allows you to control which tables are replicated. ER allows update-anywhere topologies; that is, if you have 2 systems, you can configure ER so that changes on either system are replicated to the other.
Note that IDS 9.40 and 10.00 both introduced a lot of features to make ER much simpler to manage - more reasons (if the fact that IDS 9.30 is out of support is not sufficient) to upgrade.
(IDS does not have MQT - materialized query tables.)