I face a problem with FireDAC Master-Detail relationships.
FireDAC has two modes for M/D relationships : Parameter-Based and Range-Based http://docwiki.embarcadero.com/RADStudio/Berlin/en/Master-Detail_Relationship_(FireDAC)
The first one uses parameters on every query to retrieve the correspondent details needed after every scroll, and the second one loads first all the data in the datasets, and set the fields that define the master-detail relationships (filtering the details after every scroll on the master).
You can combine both methods, giving you the advantages of both (querys returning limited records while reduced traffic with the database, offline mode, ...).
It works nice and fast except when one of the details is empty. This seems to be the reason (quoted from the Documentation) :
Combining Methods
To combine both methods, an application should use both Parameters and
Range-based setups and include fiDetails into FetchOptions.Cache. Then
FireDAC at first uses range-based M/D. And if a dataset is empty, then
FireDAC uses parameter-based M/D. The new queried records are appended
to the internal records storage.
Also, you can use the TFDDataSet.OnMasterSetValues event handler to override M/D behavior.
Suppose you have
Master BILLS
+---------+------------+
| Bill_Id | Date |
+---------+------------+
| 1 | 01/01/2017 |
+---------+------------+
Detail LINES
+---------+---------+------------+
| Bill_Id | Line_Id | Concept |
+---------+---------+------------+
| 1 | 1 | Television |
| 1 | 2 | Computer |
+---------+---------+------------+
Subdetail TAXES
+---------+---------+-----+--------+
| Bill_Id | Line_Id | Tax | Import |
+---------+---------+-----+--------+
| 1 | 1 | 14% | 74.25 |
| 1 | 1 | 7% | 36.12 |
+---------+---------+-----+--------+
I have those 3 FDQuerys with parameters :
qryBills.SQL = 'select * from BILLS where Bill_Id = :Id';
qryLines.SQL = 'select * from LINES where Bill_Id = :Id';
qryTaxes.SQL = 'select * from TAXES where Bill_Id = :Id';
And the Master-Detail relationship is defined by range
qryLines.MasterFields = 'Bill_Id';
qryTaxes.MasterFields = 'Bill_Id;Line_Id';
If all the details contain records then everything is fine, but when a detail is empty (like in my example, where there are no Taxes for the Line #2) then when I scroll to that empty detail its query is re-launched (as the documentation says) duplicating the records for the not-empty details.
I mean :
I open the three Datasets for the Bill_Id #1
Everything looks fine, I see the master record, the Line #1 and its two taxes
I move to the second line and it still looks fine, the taxes appear empty.
When I go back to the first line, now I see two times its two taxes.
If I go to the second line again, and return to the first one, now I will see three times its two taxes.
...
The problem is that every time I move to the second line, its subdetail is empty, so it relaunches the qryTaxes query, duplicating its entire content.
Is not uncommon to have empty details, do you know of a way to prevent its query to be re-launched when it happens ?. I can't find it.
Thank you.
Related
I'm trying to make an efficient query to create a view that will contains counts for the number of successful logins by day as well as by type of user with no duplicate users per day.
I have 3 tables involved in this query. One table that contains all successful login attempts, one table for standard user accounts, and one table for admin user accounts. All user_id values are unique across the entire database so there are no user accounts that will share the same user_id with an admin account:
TABLE 1: user_account
user_id | username
---------|----------
1 | user1
2 | user2
TABLE 2: admin_account
user_id | username
---------|----------
6 | admin6
7 | admin7
TABLE 3: successful_logins
user_id | timestamp
---------|------------------------------
1 | 2022-01-23 14:39:12.63798-07
1 | 2022-01-28 11:16:45.63798-07
1 | 2022-01-28 01:53:51.63798-07
2 | 2022-01-28 15:19:21.63798-07
6 | 2022-01-28 09:42:36.63798-07
2 | 2022-01-23 03:46:21.63798-07
7 | 2022-01-28 19:52:16.63798-07
2 | 2022-01-29 23:12:41.63798-07
2 | 2022-01-29 18:50:10.63798-07
The resulting view I would like to generate would contain the following information from the above 3 tables:
VEIW: login_counts
date_of_login | successful_user_logins | successful_admin_logins
---------------|------------------------|-------------------------
2022-01-23 | 1 | 1
2022-01-28 | 2 | 2
2022-01-29 | 1 | 0
I'm currently reading up on how crosstabs work but having trouble figuring out how to write the query based on my table setups.
I actually was able to get the values I needed by using the following query:
SELECT
to_char(s.timestamp, 'YYYY-MM-DD') AS login_date,
count(distinct u.user_id) AS successful_user_logins,
count(distinct a.user_id) AS successful_admin_logins
FROM successful_logins s
LEFT JOIN user_account u ON u.user_id= s.user_id
LEFT JOIN admin_account a ON a.user_id= s.user_id
GROUP BY login_date
However, I was told it would be even quicker using crosstabs, especially considering the successful_logins table contains millions of records. So I'm trying to also create a version of the query using crosstabs then comparing both execution times.
Any help would be greatly appreciated. Thanks!
Turns out it isn't possible to do what I was asking about using crosstabs, so the original query I have will have to do.
I would like to perform a query to remove duplicates. What I define as a duplicate here is a measurement where we have more than 1 data point. They will have different tags, so they are not overwritten by default but I would like to remove the oldest inserted, regardless of the tags.
So for example, measurement of logins (it doesn't really make sense but it's to avoid using abstract entities):
> Email | Name | TS | Login Time
>
> a#a.com | Alice | xxxxx1000 | 2017-05-19
> a#a.com | Alice | xxxxx1000 | 2017-05-18
> a#a.com | Alice | xxxxx1000 | 2017-05-17
> b#b.com | Bob | xxxxx1000 | 2017-05-18
> c#c.com | Charlie | xxxxx1200 | 2017-05-19
I would like to remove the second and third line, because the data point has the same timestamp as the first, it is the same measurement but they have different login times and I would like to take only the last.
I know well that I could solve this with a query, but the requirement is more complex than this (visualization in Grafana of weird KPI data) and I need to remove actual duplicates (generated and loaded twice).
Thank you.
You can fetch all login user names using group by and then order by time , so that the latest login time will come up first ,then you can delete the remaining ones.
Also, you might need to copy your latest items to some another measurement , since you can't remove row in influxdb .
For this you might use limit 1 offset 0 so that only the latest login time will come from the query output.
Let me know, if I understand it correctly.
Let say I have transaction data and visit data
visit
| userId | Visit source | Timestamp |
| A | google ads | 1 |
| A | facebook ads | 2 |
transaction
| userId | total price | timestamp |
| A | 100 | 248384 |
| B | 200 | 43298739 |
I want to join transaction data and visit data to do sales attribution. I want to do it realtime whenever transaction occurs (streaming).
Is it scalable to do join between one data and very big historical data using join function in spark?
Historical data is visit, since visit can be anytime (e.g. visit is one year before transaction occurs)
I did join of historical data and streaming data in my project. Here the problem is that you have to cache historical data in RDD and when streaming data comes, you can do join operations. But actually this is a long process.
If you are updating historical data, then you have to keep two copies and use accumulator to work with either copy at once, so it wont affect the the second copy.
For example,
transactionRDD is stream rdd which you are running at some interval.
visitRDD which is historical and you update it once a day.
So you have to maintain two databases for visitRDD. when you are updating one database, transactionRDD can work with cached copy of visitRDD and when visitRDD is updated, you switch to that copy. Actually this is very complicated.
I know this question is very old but lemme share my viewpoint.Today, this can be easily done in Apache Beam. And this job can run on same spark cluster.
Here is the table with geometry field :
Table "public.regions"
Column | Type |
-----------+-----------------------+-
id | integer |
parent_id | integer |
level | integer |
name | character varying(55) |
location | geometry |
I have stored the geometry for all continents, countries , states and cities. Since it is a huge table I need to partition the table based on top level location ( i.e continent) to improve the performance .
How can I partition my existing table based on geometry(continent) ? Is it good enough to create inheritance tables named asia, europe, australia ... and insert rows in those tables based on queries with contains ? Will that improve the performance of my queries?
For eg. I am trying to run queries like :
11.562424 48.148679 is some point in Munich
EXPLAIN ANALYZE SELECT id, name,level FROM regions WHERE
Contains((location),(GeomFromText('Point(11.562424 48.148679)')));
This is taking around 500 ms with PG in my computer whereas the same query is taking around 200ms in Oracle.
i want to make a query for two column families at once... I'm using the cassandra-cql gem for rails and my column families are:
users
following
followers
user_count
message_count
messages
Now i want to get all messages from the people a user is following. Is there a kind of multiget with cassandra-cql or is there any other possibility by changing the datamodel to get this kind of data?
I would call your current data model a traditional entity/relational design. This would make sense to use with an SQL database. When you have a relational database you rely on joins to build your views that span multiple entities.
Cassandra does not have any ability to perform joins. So instead of modeling your data based on your entities and relations, you should model it based on how you intend to query it. For your example of 'all messages from the people a user is following' you might have a column family where the rowkey is the userid and the columns are all the messages from the people that user follows (where the column name is a timestamp+userid and the value is the message):
RowKey Columns
-------------------------------------------------------------------
| | TimeStamp0:UserA | TimeStamp1:UserB | TimeStamp2:UserA |
| UserID |------------------|------------------|------------------|
| | Message | Message | Message |
-------------------------------------------------------------------
You would probably also want a column family with all the messages a specific user has written (I'm assuming that the message is broadcast to all users instead of being addressed to one particular user):
RowKey Columns
--------------------------------------------------------
| | TimeStamp0 | TimeStamp1 | TimeStamp2 |
| UserID |------------|------------|-------------------|
| | Message | Message | Message |
--------------------------------------------------------
Now when you create a new message you will need to insert it multiple places. But when you need to list all messages from people a user is following you only need to fetch from one row (which is fast).
Obviously if you support updating or deleting messages you will need to do that everywhere that there is a copy of the message. You will also need to consider what should happen when a user follows or unfollows someone. There are multiple solutions to this problem and your solution will depend on how you want your application to behave.