best db for delphi and large databases [closed] - delphi

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Using Delphi XE2: I have used AbsoluteDB for years with good success for smallish needs, but it does not scale well to large datasets. I need recommendations for a DB engine for the following conditions:
Large (multi-gigabyte) dataset, few tables with lots of small records. This is industrial-equipment historical data; most tables have new records written once a minute with a device ID, date, time and status; a couple tables have these records w/ a single data point per record, three others have 10 to 28 data points per record depending on the device type. One of the single-data-point tables adds events asynchronously and might have a dozen or more entries per minute. All of this has to be kept available for up to a year. Data is usually accessed by device ID(s) and date window.
Multi-user. A system service retrieves the data from the devices, but the trending display is a separate application and may be on a separate computer.
Fast. Able to pull any 48-hour cluster of data in at most a half-dozen seconds.
Not embedded.
Single-file if possible. Critical that backups and restores can by done programatically. Replication support would be nice but not required.
Can be integrated into our existing InstallAware packages, without user intervention in the install process.
Critical: no per-install licenses. I'm fine with buying per-seat developer licenses, but we're an industrial-equipment company, not a software company - we're not set up for keeping track of that sort of thing.
Thanks in advance!

I would use
either PostgreSQL (more proven than MySQL for such huge data)
or MongoDB
The main criteria is what you would do with the data. And you did not say much about that. Would you do individual queries by data point? Would you need to do aggregates (sum/average...) over data points per type? If "Data is usually accessed by device ID(s) and date window", then I would perhaps not store the data in individual rows, one row per data point, but gather data within "aggregates", i.e. objects or arrays stored in a "document" column.
You may store those aggregates as BLOB, but it may be not efficient. Both PostgreSQL and MongoDB have powerful objects and arrays functions, including indexes within the documents.
Don't start from the DB, but start from your logic: which data are you gathering? how is it acquired? how is it later on accessed? Then design high level objects, and let your DB store your objects in an efficient way.
Also consider the CQRS pattern: it is a good idea to store your data in several places, in several layouts, and make a clear distinction between writes (Commands) and reads (Queries). For instance, you may send all individual data points in a database, but gather the information, in a ready-to-use form, in other databases. Don't hesitate to duplicate the data! Don't rely on a single-database-centric approach! This is IMHO the key for fast queries - and what all BigData companies do.
Our Open Source mORMot framework is ideal for such process. I'm currently working on a project gathering information in real time from thousands of remote devices connected via Internet (alarm panels, in fact), then consolidating this data in a farm of servers. We use SQLite3 for local storage on each node (up to some GB), and consolidate the data in MongoDB servers. All the logic is written in high-level Delphi objects, and the framework does all the need plumbing (including real-time replication, and callbacks).

Related

Firebase realtime database vs Cloud Firestore for my app [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm developing an app for distributors which gives them a list of stores and they note in the app how many products did they sell to any store. This is the main function of my app. There are two choices: firebase realtime database or Cloud Firestore. I need only speed and less pricing. Please tell me how their pricing does work and also which one is faster? Explain shortly please.
Among other factors, the ones mentioned below are really important when choosing a Database Management System are:
Data Model, Data consistency, Data Security, Data Protection, Multi Access and Integration, Efficiency, Usability, Implementation and Service Costs.
From Firebase documentation:
Which database is right for your project?
We recommend Cloud Firestore for most developers starting a new
project. Cloud Firestore offers additional functionality, performance,
and scalability on an infrastructure designed to support more powerful
features in future releases. Expect to see new query types, more
robust security rules, and improvements to performance among the
advanced features planned for Cloud Firestore.
Using Cloud Firestore and Realtime Database
You can use both databases within the same Firebase app or project.
Both NoSQL databases can store the same types of data and the client
libraries work in a similar manner.
Please note that in the link above you shall find detailed information about the differencies between the two.
From medium.com:
Cloud Firestore vs. Firebase Realtime Database
Things they have in common:
They are both easy to integrate into a project with limited setup, and they are compatible with everything else offered by Firebase.
Administrators will be able to see the data through the Firebase
console, which uses the same scheme in both of them. What this means
is that you can scour through the nodes and collections of the top
level to find the data or information that you are looking for.
Beyond that, they do not offer any further level of exploration. If you know the keys and objects that you are looking for, this will be
useful.
Let’s take a look at the differences.
Querying Support — Firestore is more potent in this regard. Locate records that match several field comparisons. Firebase uses a
simplistic data structure, which means that you will only be able to
run queries that search for the field beginning with your query.
Importing and Exporting Data — This is a feature that Firebase provides. It comes in handy when you are migrating data or if team
members who are not developers to make some changes to the data.
Real-time updates — Firebase focuses on real-time updates, which are very useful for handling customers who are using social media or
collaborative apps. It gives developers everything they need to
determine the customers that are active users in real-time.
Costs — The costs of the Realtime Database will go up as you send more data via reading/write operations. The price of the Firestore
database will increase with every API call that you make. However, be
sure to look at the entire cost breakdown before making any decisions.
Pricing
In the following link there are detailed information about Firebase
Realtime Database Pricing and Cloud Firestore Pricing
You may also find this article useful:
Cloud Firestore vs the Realtime Database: Which one do I use?
You should think more carefully the future work load your environment will have. That will give you better ground to evaluate the pricing. In general, I would prefer to use Firestore because it is the new one and it is superior when it comes to scaling the service. It will scale automatically as your user base grows and will not need any sharding in high volumes.
Realtime Database is the Firebase’s first and original cloud-based database. For the mobile apps requiring synced states across clients in realtime, it is an efficient and low-latency solution.
Cloud Firestore is Firebase’s newest flagship database for mobile apps. It is a successor to the Realtime Database with a new and more intuitive data model. Cloud Firestore is richer, faster, and more scalable than the Realtime Database.
You should check the official docs. And get a proper idea between these two. There are things you need to consider before choosing either of them.
According to your goal, both will help you in different ways.
Realtime and offline support, Querying, Writes and transactions, Reliability and Performance, Scalability, Security, Pricing, Data Model, etc.
These are the things you need to consider while choosing either of these
I would like to suggest this link though.
Cloud Firestore vs the Realtime Database: Which one do I use?

What is the benefit of storing data in databases like SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is very elementary question but why does a framework like Rails use ActiveRecord to run SQL commands to get data from a DB? I heard that you can cached data on the Rails server itself, so why not just store all data on the the server instead of the DB? Is it because space on the server is a lot more expensive/valuable than on the DB? If so, why is that? Also can the reason be that you want a ORM in the DB and that just takes too much code to set up on the Rails server? Sorry if this question sounds dumb but I don't know where else I can go for an answer.
What if some other program/person wants to access this data and for some reason cannot use your rails application? What if in future you decide to discontinue using rails and decide to go with some other technology for front end but want to keep the data? In these cases having a separate database helps. Also could you run complex join queries on cached data on Rail Server?
databases hold a substantial amount of advantages against other types of databases. Some of them are listed below:
Data integrity is maximised and data redundancy is minimised, as
the single storing place of all the data also implies that a given
set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data
reliability.
Generally bigger data security, as the single data storage location
implies only a one possible place from which the database can be
attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to
often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a
single database design.
Generally easier data portability and database administration. More
cost effective than other types of database systems as labour, power
supply and maintenance costs are all minimised.
Data kept in the same location is easier to be changed, re-organised,
mirrored, or analysed.
All the information can be accessed at the same time from the same
location.
Updates to any given set of data are immediately received by every
end-user.

Business Intelligence [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need please some clarifications about the BI architecture. According to what I understood, the first step is to gather data from different data sources, clean it, and load it to a data warehouse through an ETL. The Data Schema of the datawarehouse shouldn't be relational, and should support fast business operations (ex. Star schema), then finally we have some reporting tools such as qlick, Tableau ...etc. My question is, what is OLAP and in which step does it come to existence?
thx,
OLAP = online analytical processing, which usually means 'cube' which is usually about reporting at various summaries
This is in contrast to OLTP = online transactional processing which usually refers to a system (usually stored in a relational database) that does a high volume of reads and writes at a detailed level
A cube represents things to users as facts and dimensions.
A data warehouse star schema also represents things as facts and dimensions. In a datawarehouse star schema (which is relational but is not normalised), these are stored in tables
To get a 'grand total' out of a star schema you write a SQL query that runs against the database and adds up all the detail level data into a grand total. Sometimes this takes time
To get a 'grand total' out of a cube (OLAP) you drag and drop the dimensions and measures you want (you usually use a client tool to analyse a cube) and the answer appears much faster because a cube is generally optimised for summaries, (i.e. it usually has summaries pre saved in it, and the storage mechanism is optimised for generating summaries)
A cube is usually built from a star schema but doesn't have to be - it just makes it a lot easier to build it if it is
are'nt Olap cubes represented by the data model in warehouse (star schema for ex.)?
Yes they are represented but they are different things. One stores data in a database. One stores data in a cube. A cube is usually loaded from data, usually from a database.

What is the difference between an historian and a data warehouse? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am working on a project to implement an historian.
I can't really find a difference between an historian and a data warehouse.
Any details would be useful.
Data Historian
Data historians are groups of tables within a database that store historical information about a process or information system.
Data historians are used to keep historical data regarding a manufacturing system. This data can be changes in state of a data point, current values and summary data for these points. Usually this data comes from automated systems like PLCs, DCS or other process controlling system. However some historian data can be human entered.
There are several historians available for commercial use. However, one of the most common historians have tended to be custom developed. The commercial versions would be products like OsiSoft’s PI or GE’s Data Historian.
Some examples of data that could be stored in a data historian are items (or tags) like:
- Total products manufactured for the day
-Total defects created on a particular crew shift
-Current temperature of a motor on the production line
-Set point for the maximum allowable value being monitored by another tag
-Current speed of a conveyor
-Maximum flow rate of a pump over a period of time
-Human entered marker showing a manual event occured
-Total amount of a chemical added to a tank
These items are some of the important data tags that might be captured. However, once captured the next step is in presentation or reporting of that data. This is where the work of analysis is of great importance. The data/time stamp of one tag can have a huge correlation to another/other tag(s). Carefully storing this in the historians’ database is critical to good reporting.
The retrieval of data stored in a data historian is the slowest part of the system to be implemented. Many companies do a great job of putting data into a historian, but then do not go back and retrieve any of the data. Many times this author has gone into a site that claims to have a historian only to find that the data is “in there somewhere”, but has never had a report run against the data to validate the accuracy of the data.
The rule-of-thumb should be to provide feedback on any of the tags entered as soon as possible after storage into the historian. Reporting on the first few entries of a newly added tag is important, but ongoing review is important too. Once the data is incorporated into both a detailed listing and a summarized list the data can be reviewed for accuracy by operations personnel on a regular basis.
This regular review process by the operational personnel is very important. The finest data gathering systems that might historically archive millions of data points will be of little value to anyone if the data is not reviewed for accuracy by those that are experts in that information.
Data Warehouse
Data warehousing combines data from multiple, usually varied, sources into one comprehensive and easily manipulated database. Different methods can then be used by a company or organization to access this data for a wide range of purposes. Analysis can be performed to determine trends over time and to create plans based on this information. Smaller companies often use more limited formats to analyze more precise or smaller data sets, though warehousing can also utilize these methods.
Accessing Data Through Warehousing
Common methods for accessing systems of data warehousing include queries, reporting, and analysis. Because warehousing creates one database, the number of sources can be nearly limitless, provided the system can handle the volume. The final result, however, is homogeneous data, which can be more easily manipulated. System queries are used to gain information from a warehouse and this creates a report for analysis.
Uses for Data Warehouses
Companies commonly use data warehousing to analyze trends over time. They might use it to view day-to-day operations, but its primary function is often strategic planning based on long-term data overviews. From such reports, companies make business models, forecasts, and other projections. Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only.

Ruby on Rails database and application design

We have to create rather large Ruby on Rails application based on large database. This database is updated daily, each table has about 500 000 records (or more) and this number will grow over time. We will also have to provide proper versioning of all data along with referential integrity. It must be possible for user to move from version to version, which are kind of "snapshots" of main database at different points of time. In addition some portions of data need to be served to other external applications with and API.
Considering large amounts of data we thought of splitting database into pieces:
State of the data at present time
Versioned attributes of each table
Snapshots of the first database at specific, historical points in time
Each of those would have it's own application, creating a service with API to interact with the data. It's needed as we don't want to create multiple applications connecting to multiple databases directly.
The question is: is this the proper approach? If not, what would you suggest?
We've never had any experience with project of this magnitude and we're trying to find the best possible solution. We don't know if this kind of data separation has any sense. If so, how to provide proper communication of different applications with individual services and between services themselves, as this will be also required.
In general the amount of data in the tables should not be your first concern. In PostgreSQL you have a very large number of options to optimize queries against large tables. The larger question has to do with what exactly you are querying, when, and why. Your query loads are always larger concerns than the amount of data. It's one thing to have ten years of financial data amounting to 4M rows. It's something different to have to aggregate those ten years of data to determine what the balance of the checking account is.
In general it sounds to me like you are trying to create a system that will rely on such aggregates. In that case I recommend the following approach, which I call log-aggregate-snapshot. In this, you have essentially three complementary models which work together to provide up-to-date well-performing solution. However the restrictions on this are important to recognize and understand.
Event model. This is append-only, with no updates. In this model inserts occur, and updates to some metadata used for some queries only as absolutely needed. For a financial application this would be the tables representing the journal entries and lines.
The aggregate closing model. This is append-only (though deletes are allowed for purposes of re-opening periods). This provides roll-forward information for specific purposes. Once a closing entry is in, no entries can be made for a closed period. In a financial application, this would represent closing balances. New balances can be calculated by starting at an aggregation point and rolling forward. You can also use partial indexes to make it easier to pull just the data you need.
Auxiliary data model. This consists of smaller tables which do allow updates, inserts, and deletes provided that integrity to the other models is not impinged. In a financial application this might be things like customer or vendor data, employee data, and the like.

Resources