question about TDengine aggregation usage - time-series

I'm using TDengine for data processing.
Is there someone know how to do aggregation by fixed step, not by time window?
original data
ts | val |
======================================================
2021-04-12 17:31:00.000 | 5.89000 |
2021-04-12 17:32:00.000 | 5.86000 |
2021-04-12 17:33:00.000 | 5.85000 |
2021-04-12 17:34:00.000 | 5.84000 |
2021-04-12 17:35:00.000 | 5.82000 |
2021-04-12 17:36:00.000 | 5.84000 |
2021-04-12 17:37:00.000 | 5.83000 |
2021-04-12 17:38:00.000 | 5.83000 |
2021-04-12 17:39:00.000 | 5.84000 |
2021-04-12 17:40:00.000 | 5.84000 |
2021-04-12 17:41:00.000 | 5.84000 |
2021-04-12 17:42:00.000 | 5.85000 |
2021-04-12 17:43:00.000 | 5.84000 |
2021-04-12 17:44:00.000 | 5.85000 |
such as sum(val) by step 5,expected
2021-04-12 17:35:00.000 29.26
2021-04-12 17:40:00.000 29.18
2021-04-12 17:44:00.000 29.18
thanks,

You can use INTERVAl and SlIDING functions to aggregation. Documents: https://www.taosdata.com/en/documentation/taos-sql#functions

Related

Neo4j Cypher: How to optimize a NOT EXISTS Query when cardinality is high

The below query takes over 1 second & consumer about 7 MB when cardinality b/w users to posts is about 8000 (one user views about 8000 posts). It is difficult to scale this due to high & linearly growing latencies & memory consumption. Is there a possibility to model this differently and/or optimise the query?
Query
PROFILE MATCH (u:User)-[:CREATED]->(p:Post) WHERE NOT (:User{ID: 2})-[:VIEWED]->(p) RETURN p.ID
Plan
| Plan | Statement | Version | Planner | Runtime | Time | DbHits | Rows | Memory (Bytes) |
+-----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST" | "INTERPRETED" | 1033 | 3721750 | 10 | 6696240 |
+-----------------------------------------------------------------------------------------------------------+
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Cache H/M | Memory (Bytes) | Ordered by |
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +ProduceResults#neo4j | `p.ID` | 2158 | 10 | 0 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Projection#neo4j | p.ID AS `p.ID` | 2158 | 10 | 10 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Filter#neo4j | u:User | 2158 | 10 | 10 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Expand(All)#neo4j | (p)<-[anon_15:CREATED]-(u) | 2158 | 10 | 20 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +AntiSemiApply#neo4j | | 2158 | 10 | 0 | 0/0 | | |
| |\ +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| | +Expand(Into)#neo4j | (anon_47)-[anon_61:VIEWED]->(p) | 233 | 0 | 3695819 | 0/0 | 6696240 | anon_47.ID ASC |
| | | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| | +NodeUniqueIndexSeek#neo4j | UNIQUE anon_47:User(ID) WHERE ID = $autoint_0 | 8630 | 8630 | 17260 | 0/0 | | anon_47.ID ASC |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +NodeByLabelScan#neo4j | p:Post | 8630 | 8630 | 8631 | 0/0 | | |
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
Yes, this can be improved.
First, let's understand what this is doing.
First, it starts with a NodeByLabelScan. That makes sense, there's no avoiding that.
But then, for every node of the label (the following executes PER ROW!), it matches to user 2, and expands all :VIEWED relationships from user 2 to see if any of them is the post for that particular row.
Can you see why this is inefficient? There are 8630 post nodes according to the PROFILE plan, so user 2 is looked up by index 8630 times, and their :VIEWED relationships are expanded 8630 times. Why 8630 times? Because this is happening per :Post node.
Instead, try this:
MATCH (:User{ID: 2})-[:VIEWED]->(viewedPost)
WITH collect(viewedPost) as viewedPosts
MATCH (:User)-[:CREATED]->(p:Post)
WHERE NOT p IN viewedPosts
RETURN p.ID
This changes things up a bit.
First it matches to user 2's viewed posts (the lookup and expansion is performed only once), then those viewed posts are collected.
Then it will do a label scan, and filter such that the post isn't in the collection of viewed posts.

Time series binary classfication [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Problem:
I have a dataset about hedge fund. It contains monthly hedge fund returns and some financial metrics. I calculated metrics for every month from 2010 to 2019 December. (2889 monthly data) I want to binary classification and predict hedge funds' class basis on these metrics for next month. I want make prediction for T+1 from T time. And i want use random forest and other classifiers(Decision Tree,KNN,SVM,logistic regression). I know this dataset is time series problem, how do i convert this to machine learning problem.
I am open to your suggestions and advisories as to what method or approach should be followed in modeling, feature engineering and editing this data set.
Additional Questions:
1)How do I make a data split when using this data for training and test ? 0,80-0,20?. Is there any other method of validation you can recommend?
2)some funds are added to the data later, so not all funds have data of equal length, for example, the "AEB" fund established in 2015 has no data before 2015. There are a few such funds, do they cause problems, or is it better to delete them and remove them from the dataset? I have a total of 27 different fund data.
3)In addition, I have changed the tickers/names of the hedge funds to numeric ID, is it possible to do dummy encoding, would it be better for performance?
Sample Dataset:
Date | Fund Name / Ticker | sharpe | sortino | beta | alpha | target |
------------|--------------------|--------|---------|-------|-------|--------|--
31.03.2010 | ABC | -0,08 | 0,025 | 0,6 | 0,13 | 1 |
31.03.2010 | DEF | 0,41 | 1,2 | 1,09 | 0,045 | 0 |
31.03.2010 | SDF | 0,03 | 0,13 | 0,99 | -0,07 | 1 |
31.03.2010 | CBD | 0,71 | -0,05 | 1,21 | 0,2 | 1 |
30.04.2010 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.04.2010 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.04.2010 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.04.2010 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
: | : | : | : | : | : | : |
: | : | : | : | : | : | : |
30.12.2019 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 1 |
30.12.2019 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.12.2019 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.12.2019 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
30.12.2019 | FGF | 1,45 | 0,98 | -0,03 | 0,55 | 1 |
30.12.2019 | AEB | 0,25 | 1,22 | 0,17 | -0,44 | 0 |
My Idea and First Try:
I modeled one example. I used a method like this, I shifted(-1) back the target variable. So each line was shown the class in which the fund was located in the following month.I did it because of this, I want to predict the next month before that month starts. Predict to T+1 from T.But this model gave a very poor result.(%43)
view of this model dataset:
Date | Fund Name / Ticker | sharpe | sortino | beta | alpha | target |
------------|--------------------|--------|---------|-------|-------|--------|--
31.03.2010 | ABC | -0,08 | 0,025 | 0,6 | 0,13 | 1 |
31.03.2010 | DEF | 0,41 | 1,2 | 1,09 | 0,045 | 0 |
31.03.2010 | SDF | 0,03 | 0,13 | 0,99 | -0,07 | 1 |
31.03.2010 | CBD | 0,71 | -0,05 | 1,21 | 0,2 | 1 |
30.04.2010 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.04.2010 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.04.2010 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.04.2010 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
: | : | : | : | : | : | : |
: | : | : | : | : | : | : |
30.12.2019 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.12.2019 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.12.2019 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 1 |
30.12.2019 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
30.12.2019 | FGF | 1,45 | 0,98 | -0,03 | 0,55 | 0 |
30.12.2019 | AEB | 0,25 | 1,22 | 0,17 | -0,44 | ? |
There are many approaches out there that you can find. Time series are challenging and its okay to have poor results at the beginning. I advise you do the following:
Add some lags as additional columns in your dataset. You want to predict t+1 and you have t, so try to also compute t-1,t-2, t-3, etc.
In order to know the best number of t-x that you can have, try to do ACF and PACF plots and see the first lags that appear in the in the shaded region
Lags might boost your accuracy
try to normalize/standardize your data when modeling
Try to see if your time series is a random walk, if it is, there are many recent papers that try to tackle the problem of random walk prediction
If your dataset is big enough, try to use some neural networks like LSTM, RNN, GANs, etc that might be better than the shallow models you have mentioned
I really advise you to see the tutorials of Jason Brownlee on Time Series here Jason is super intelligent and you can always add comments to his tutorials. He is also responsive!!

InfluxDB select different time between two row having same a field value

I have a table like this on InfluxDB:
+---------------+-----------------+--------+--------------------------+
| time | sequence_number | action | session_id |
+---------------+-----------------+--------+--------------------------+
| 1433322591220 | 270001 | delete | 556d85bfe26c3b3864617605 |
| 1433322553324 | 250001 | delete | 556d88e4e26c3b3b83c99d32 |
| 1433241828472 | 230001 | create | 556d88e4e26c3b3b83c99d32 |
| 1433241023633 | 80001 | create | 556d85bfe26c3b3864617605 |
| 1433239305306 | 70001 | create | 556d7f09e26c3b34e872b2ba |
+---------------+-----------------+--------+--------------------------+
Now I want to find the time range from a session be created to deleted, that means get time where action=delete minus time where action=create if they have same session_id

How to create scrollview pagination with each page contain 4 image from plist/nsdictionary

I need to make an app that using a scrollview with pagecontrol to view image gallery, each page contains about 4 image and the image are loaded from plist or nsdictionary for example it will look like "WallpapersHD" in appstore.
the example:
________ ________ ________ ________
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
|________| |________| |________| |________|
________ ________ ==SWIPE==> ________
| | | | | |
| | | | | |
| | | | | |
| | | | | |
|________| |________| |________|
First Pge Second Page
I really don't know how to start,, how to make the imageview have dependency with theimageurl data on plist file.
As you can see in the first page there is a four imageview but in the second page there is only three imageview it because the data in plist is only seven imageurl..
so I need your help to do paging in ios
MGGridView is what you need. There's also an example app with the code
Good luck

Access violation while the program was idle - not trace information to track down the bug

I have a program that just popped up an AV. Until now the Eureka Log could find the source code line that generated the error but now it displays only this:
Access violation at address 7E452E4E in module 'USER32.dll'. Read of address 00000015.
Call Stack Information:
--------------------------------------------------------------------------------------------
|Address |Module |Unit |Class|Procedure/Method |Line |
--------------------------------------------------------------------------------------------
|Running Thread: ID=2640; Priority=0; Class=; [Main] |
|------------------------------------------------------------------------------------------|
|77F16A7E|GDI32.dll | | |IntersectClipRect | |
|7E433000|USER32.dll | | |EditWndProc | |
|7E42A993|USER32.dll | | |CallWindowProcA | |
|7E42A97D|USER32.dll | | |CallWindowProcA | |
|7E429011|USER32.dll | | |OffsetRect | |
|7E4196C2|USER32.dll | | |DispatchMessageA | |
|7E4196B8|USER32.dll | | |DispatchMessageA | |
|00625E13|Amper.exe |Amper.DPR | | |76[16]|
|7C915511|ntdll.dll | | |RtlFindActivationContextSectionString| |
|7C915D61|ntdll.dll | | |RtlFindCharInUnicodeString | |
|7C910466|ntdll.dll | | |RtlFreeUnicodeString | |
|7C80B87C|kernel32.dll | | |IsDBCSLeadByte | |
|7C9113ED|ntdll.dll | | |RtlDeleteCriticalSection | |
|7C80EEF5|kernel32.dll | | |FindClose | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C912D19|ntdll.dll | | |LdrUnlockLoaderLock | |
|7C9166C1|ntdll.dll | | |LdrGetDllHandleEx | |
|7C9166B3|ntdll.dll | | |LdrGetDllHandle | |
|7C9166A0|ntdll.dll | | |LdrGetDllHandle | |
|7C912A8D|ntdll.dll | | |RtlUnicodeToMultiByteN | |
|7C912C21|ntdll.dll | | |RtlUnicodeStringToAnsiString | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CC9|ntdll.dll | | |LdrLockLoaderLock | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C912D19|ntdll.dll | | |LdrUnlockLoaderLock | |
|7C90CF78|ntdll.dll | | |ZwAllocateVirtualMemory | |
|7C90CF6E|ntdll.dll | | |ZwAllocateVirtualMemory | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C80BA57|kernel32.dll | | |VirtualQueryEx | |
|7C80BA40|kernel32.dll | | |VirtualQueryEx | |
|7C80BA81|kernel32.dll | | |VirtualQuery | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CC9|ntdll.dll | | |LdrLockLoaderLock | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
--------------------------------------------------------------------------------------------
The program was totally idle while I got the error and its window was hidden by other windows. FastMM is active and set to full debug but it indicates no memory overwrite.
Any hints about how to find the origin of this AV?
Win XP, Delphi 7
I don't see an EditWndProc() method in user32.dll, but Delphi has a couple -- one dealing with combobox messages and one dealing with tree views. Given MS's comctrl mess, I'd guess you have a tree view?
Check your tree view stuff. Given IntersectClipRect's parameters, it's easy to guess that it's being passed an invalid device context -- so...are you doing any custom painting for your tree view? If so, are you checking to make sure the canvas handle is ! NIL before you begin painting (try assertions if nothing else)?
I just wonder what's on line 76[16] in Amper.exe... That line number might be a hint of the location of the error.
Then again, when it's just happening during an idle moment then it basically happens when the system is processing Windows messages like the mouse moving, keyboard events, timer updates and a lot more.
It sometimes helps to search for the error message plus code. I've done a quick scan and found this KB from MS which suggests that this kind of error can happen when you call certain Windows API's with invalid parameters. But this KB doesn't apply to your error. Still, it gives you an idea about what to check: any Windows API call you make in your own code.
Does it also generate this exception in the IDE, while you're debugging?
That's what EurekaLog does when it has nothing to work with. You need to rebuild and have the linker produce a detailed map file. That's how it knows what to apply its stack trace to.

Resources