Why autoextend on Oracle XE not worked - oracle-xe

We had a problem with our prod environment. Suddenly the exception began to appear.
ORA-01654: unable to extend index EMA.TRANSFERI2 by 128 in tablespace SYSTEM
As the solution of the problem my collegue added new datafile. But the question is, why the autoextend mechanism didn't worked? I'm not DBA, but I checked the configuration and it seems ok to me. It occurs only on prod environment, so I would rather avoid experimenting.
We have the table in system tablespace, which I already know, should be moved to users tablespace. But anyway, autoextend should work also on system tablepsace. Here is my config of table, datafiles and tablespace
TABLESPACE_NAME | PCT_FREE | PCT_USED | INITIAL_EXTENT | NEXT_EXTENT | MIN_EXTENTS | MAX_EXTENTS | PCT_INCREASE
SYSTEM | 10 | 40 | 65536 | 1048576 | 1 | 2147483645 | null
FILE_NAME | FILE_ID | TABLESPACE_NAME | BYTES | BLOCKS | STATUS | RELATIVE_FNO | AUTOEXTENSIBLE | MAXBYTES | MAXBLOCKS | INCREMENT_BY | USER_BYTES | USER_BLOCKS | ONLINE_STATUS
/u01/app/oracle/oradata/XE/system.dbf | 1 | SYSTEM | 629145600 | 76800 | AVAILABLE | 1 | YES | 629145600 | 76800 | 1280 | 628097024 | 76672 | SYSTEM
/u01/app/oracle/oradata/XE/system2.dbf | 5 | SYSTEM | 1048576000 | 128000 | AVAILABLE | 5 | YES | 2147483648 | 262144 | 25600 | 1047527424 | 127872 | SYSTEM
TABLESPACE_NAME | BLOCK_SIZE | INITIAL_EXTENT | NEXT_EXTENT | MIN_EXTENTS | MAX_EXTENTS | MAX_SIZE | PCT_INCREASE | MIN_EXTLEN | STATUS | CONTENTS | ALLOCATION_TYPE | SEGMENT_SPACE_MANAGEMENT | BIGFILE
SYSTEM | 8192 | 65536 | null | 1 | 2147483645 | 2147483645 | 65536 | ONLINE | PERMANENT | LOCAL | SYSTEM | MANUAL | NO

The MAXBYTES value for your system.dbf file is set to 629145600, so when your file size reached that limit, it couldn't be extended any further. It had autoextended up to that point, but wouldn't extend beyond the soft limit that had been specified for the file. That was set when the tablespace was created, using the autoextend MAXSIZE clause.
The limit may have been set because of the size of the underlying file system, to cause an error in case of runaway/unexpected growth, unintentionally, or for some other reason now known only to whoever set the database up.
As an alternative to adding a second data file, your DBA could have increased the soft limit on the existing file with alter database. But neither should be done lightly; the reason for the original restriction should be understood (especially if the filesystem could run out of space as a result of an increase) and the reason for growth should be examined too.

Related

Neo4j Cypher: How to optimize a NOT EXISTS Query when cardinality is high

The below query takes over 1 second & consumer about 7 MB when cardinality b/w users to posts is about 8000 (one user views about 8000 posts). It is difficult to scale this due to high & linearly growing latencies & memory consumption. Is there a possibility to model this differently and/or optimise the query?
Query
PROFILE MATCH (u:User)-[:CREATED]->(p:Post) WHERE NOT (:User{ID: 2})-[:VIEWED]->(p) RETURN p.ID
Plan
| Plan | Statement | Version | Planner | Runtime | Time | DbHits | Rows | Memory (Bytes) |
+-----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST" | "INTERPRETED" | 1033 | 3721750 | 10 | 6696240 |
+-----------------------------------------------------------------------------------------------------------+
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Cache H/M | Memory (Bytes) | Ordered by |
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +ProduceResults#neo4j | `p.ID` | 2158 | 10 | 0 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Projection#neo4j | p.ID AS `p.ID` | 2158 | 10 | 10 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Filter#neo4j | u:User | 2158 | 10 | 10 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +Expand(All)#neo4j | (p)<-[anon_15:CREATED]-(u) | 2158 | 10 | 20 | 0/0 | | |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +AntiSemiApply#neo4j | | 2158 | 10 | 0 | 0/0 | | |
| |\ +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| | +Expand(Into)#neo4j | (anon_47)-[anon_61:VIEWED]->(p) | 233 | 0 | 3695819 | 0/0 | 6696240 | anon_47.ID ASC |
| | | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| | +NodeUniqueIndexSeek#neo4j | UNIQUE anon_47:User(ID) WHERE ID = $autoint_0 | 8630 | 8630 | 17260 | 0/0 | | anon_47.ID ASC |
| | +-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
| +NodeByLabelScan#neo4j | p:Post | 8630 | 8630 | 8631 | 0/0 | | |
+------------------------------+-----------------------------------------------+----------------+------+---------+-----------+----------------+----------------+
Yes, this can be improved.
First, let's understand what this is doing.
First, it starts with a NodeByLabelScan. That makes sense, there's no avoiding that.
But then, for every node of the label (the following executes PER ROW!), it matches to user 2, and expands all :VIEWED relationships from user 2 to see if any of them is the post for that particular row.
Can you see why this is inefficient? There are 8630 post nodes according to the PROFILE plan, so user 2 is looked up by index 8630 times, and their :VIEWED relationships are expanded 8630 times. Why 8630 times? Because this is happening per :Post node.
Instead, try this:
MATCH (:User{ID: 2})-[:VIEWED]->(viewedPost)
WITH collect(viewedPost) as viewedPosts
MATCH (:User)-[:CREATED]->(p:Post)
WHERE NOT p IN viewedPosts
RETURN p.ID
This changes things up a bit.
First it matches to user 2's viewed posts (the lookup and expansion is performed only once), then those viewed posts are collected.
Then it will do a label scan, and filter such that the post isn't in the collection of viewed posts.

Time series binary classfication [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Problem:
I have a dataset about hedge fund. It contains monthly hedge fund returns and some financial metrics. I calculated metrics for every month from 2010 to 2019 December. (2889 monthly data) I want to binary classification and predict hedge funds' class basis on these metrics for next month. I want make prediction for T+1 from T time. And i want use random forest and other classifiers(Decision Tree,KNN,SVM,logistic regression). I know this dataset is time series problem, how do i convert this to machine learning problem.
I am open to your suggestions and advisories as to what method or approach should be followed in modeling, feature engineering and editing this data set.
Additional Questions:
1)How do I make a data split when using this data for training and test ? 0,80-0,20?. Is there any other method of validation you can recommend?
2)some funds are added to the data later, so not all funds have data of equal length, for example, the "AEB" fund established in 2015 has no data before 2015. There are a few such funds, do they cause problems, or is it better to delete them and remove them from the dataset? I have a total of 27 different fund data.
3)In addition, I have changed the tickers/names of the hedge funds to numeric ID, is it possible to do dummy encoding, would it be better for performance?
Sample Dataset:
Date | Fund Name / Ticker | sharpe | sortino | beta | alpha | target |
------------|--------------------|--------|---------|-------|-------|--------|--
31.03.2010 | ABC | -0,08 | 0,025 | 0,6 | 0,13 | 1 |
31.03.2010 | DEF | 0,41 | 1,2 | 1,09 | 0,045 | 0 |
31.03.2010 | SDF | 0,03 | 0,13 | 0,99 | -0,07 | 1 |
31.03.2010 | CBD | 0,71 | -0,05 | 1,21 | 0,2 | 1 |
30.04.2010 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.04.2010 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.04.2010 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.04.2010 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
: | : | : | : | : | : | : |
: | : | : | : | : | : | : |
30.12.2019 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 1 |
30.12.2019 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.12.2019 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.12.2019 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
30.12.2019 | FGF | 1,45 | 0,98 | -0,03 | 0,55 | 1 |
30.12.2019 | AEB | 0,25 | 1,22 | 0,17 | -0,44 | 0 |
My Idea and First Try:
I modeled one example. I used a method like this, I shifted(-1) back the target variable. So each line was shown the class in which the fund was located in the following month.I did it because of this, I want to predict the next month before that month starts. Predict to T+1 from T.But this model gave a very poor result.(%43)
view of this model dataset:
Date | Fund Name / Ticker | sharpe | sortino | beta | alpha | target |
------------|--------------------|--------|---------|-------|-------|--------|--
31.03.2010 | ABC | -0,08 | 0,025 | 0,6 | 0,13 | 1 |
31.03.2010 | DEF | 0,41 | 1,2 | 1,09 | 0,045 | 0 |
31.03.2010 | SDF | 0,03 | 0,13 | 0,99 | -0,07 | 1 |
31.03.2010 | CBD | 0,71 | -0,05 | 1,21 | 0,2 | 1 |
30.04.2010 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.04.2010 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.04.2010 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 0 |
30.04.2010 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
: | : | : | : | : | : | : |
: | : | : | : | : | : | : |
30.12.2019 | ABC | 0,05 | -0,07 | 0,41 | 0,04 | 0 |
30.12.2019 | DEF | 0,96 | 0,2 | 1,09 | 1,5 | 0 |
30.12.2019 | SDF | -0,06 | 0,23 | 0,13 | 0,23 | 1 |
30.12.2019 | CBD | 0,75 | -0,01 | 0,97 | -0,06 | 1 |
30.12.2019 | FGF | 1,45 | 0,98 | -0,03 | 0,55 | 0 |
30.12.2019 | AEB | 0,25 | 1,22 | 0,17 | -0,44 | ? |
There are many approaches out there that you can find. Time series are challenging and its okay to have poor results at the beginning. I advise you do the following:
Add some lags as additional columns in your dataset. You want to predict t+1 and you have t, so try to also compute t-1,t-2, t-3, etc.
In order to know the best number of t-x that you can have, try to do ACF and PACF plots and see the first lags that appear in the in the shaded region
Lags might boost your accuracy
try to normalize/standardize your data when modeling
Try to see if your time series is a random walk, if it is, there are many recent papers that try to tackle the problem of random walk prediction
If your dataset is big enough, try to use some neural networks like LSTM, RNN, GANs, etc that might be better than the shallow models you have mentioned
I really advise you to see the tutorials of Jason Brownlee on Time Series here Jason is super intelligent and you can always add comments to his tutorials. He is also responsive!!

How to interpret and use Emokit data?

I am using EmoKit (https://github.com/openyou/emokit) to retrieve data. The sample data looks like as follows:
+========================================================+
| Sensor | Value | Quality | Quality L1 | Quality L2 |
+--------+----------+----------+------------+------------+
| F3 | -768 | 5672 | None | Excellent |
| FC5 | 603 | 7296 | None | Excellent |
| AF3 | 311 | 7696 | None | Excellent |
| F7 | -21 | 296 | Nothing | Nothing |
| T7 | 433 | 104 | Nothing | Nothing |
| P7 | 581 | 7592 | None | Excellent |
| O1 | 812 | 7760 | None | Excellent |
| O2 | 137 | 6032 | None | Excellent |
| P8 | 211 | 5912 | None | Excellent |
| T8 | -51 | 6624 | None | Excellent |
| F8 | 402 | 7768 | None | Excellent |
| AF4 | -52 | 7024 | None | Excellent |
| FC6 | 249 | 6064 | None | Excellent |
| F4 | 509 | 5352 | None | Excellent |
| X | -2 | N/A | N/A | N/A |
| Y | 0 | N/A | N/A | N/A |
| Z | ? | N/A | N/A | N/A |
| Batt | 82 | N/A | N/A | N/A |
+--------+----------+----------+------------+------------+
|Packets Received: 3101 | Packets Processed: 3100 |
| Sampling Rate: 129 | Crypto Rate: 129 |
+========================================================+
Are these values in micro-volts? If so, how can these be more than 200 microvolts? The EEG data is in the range of 0-200 microvolts. Or does this require some kind of processing? If so what?
As described in the frequently asked questions of emokit, :
What unit is the data I'm getting back in? How do I get volts out of it?
One least-significant-bit of the fourteen-bit value you get back is 0.51 microvolts. See the specification for more details.
Looking for the details in the specification (via archive.org), we find the following for the "Emotiv EPOC Neuroheadset":
Resolution | 14 bits 1 LSB = 0.51μV (16 bit ADC,
| 2 bits instrumental noise floor discarded)
Dynamic range (input referred) | 8400μV (pp)
As a validation we can check that for a 14 bits linear ADC, the 8400 microvolts (peak-to-peak) would be divided in steps of 8400 / 16384 or approximately 0.5127 microvolts.
For the Epoc+, the comparison chart indicates a 14-bit and a 16-bit version (with a +/- 4.17mV dynamic range or 8340 microvolts peak-to-peak). The 16-bit version would then have raw data steps of 8340 / 65536 or approximately 0.127 microvolts. If that is what you are using, then the largest value of 812 you listed would correspond to 812 * 0.127 = 103 microvolts.

Heroku not listing pg_search_documents table

This is pretty weird but for some reason, heroku doesn't seem to show the pg_search_documents table when when I list tables using the heroku-sql-console.
>> heroku sql
SQL> show tables
+------------------------+
| table_name |
+------------------------+
| activity_notifications |
| attachments |
| businesses |
| color_modes |
| comments |
| counties |
| customer_employees |
| customers |
| delayed_jobs |
| file_imports |
| invitations |
| invoices |
| jobs |
| paper_stocks |
| paper_weights |
| quotes |
| rails_admin_histories |
| schema_migrations |
| tax_rates |
| users |
+------------------------+
As you can see, no mention of pg_search.
Then, in the same session,
SQL> select * from pg_search_documents;
+---------------------------------------------------------------------------------------------------------------------+
| id | content | searchable_id | searchable_type | created_at | updated_at |
+---------------------------------------------------------------------------------------------------------------------+
| 3 | Energy Centre | 3 | Customer | 2012-12-03 19:33:55 -0800 | 2012-12-03 19:33:55 -0800 |
+---------------------------------------------------------------------------------------------------------------------+
It's also interesting that the show tables command lists only 20 tables whereas heroku pg:info says there are 21.
The reason this is a problem rather than a curiosity is because I can't get heroku db:pull to pull down the pg_search_documents table (everything else pulls fine) and I can't test migrations on production data.
I'm using PG Version: 9.1.6 on heroku and PostgreSQL 9.2.1 locally. Also PgSearch 0.5.7.
Any ideas what the issue is?

Access violation while the program was idle - not trace information to track down the bug

I have a program that just popped up an AV. Until now the Eureka Log could find the source code line that generated the error but now it displays only this:
Access violation at address 7E452E4E in module 'USER32.dll'. Read of address 00000015.
Call Stack Information:
--------------------------------------------------------------------------------------------
|Address |Module |Unit |Class|Procedure/Method |Line |
--------------------------------------------------------------------------------------------
|Running Thread: ID=2640; Priority=0; Class=; [Main] |
|------------------------------------------------------------------------------------------|
|77F16A7E|GDI32.dll | | |IntersectClipRect | |
|7E433000|USER32.dll | | |EditWndProc | |
|7E42A993|USER32.dll | | |CallWindowProcA | |
|7E42A97D|USER32.dll | | |CallWindowProcA | |
|7E429011|USER32.dll | | |OffsetRect | |
|7E4196C2|USER32.dll | | |DispatchMessageA | |
|7E4196B8|USER32.dll | | |DispatchMessageA | |
|00625E13|Amper.exe |Amper.DPR | | |76[16]|
|7C915511|ntdll.dll | | |RtlFindActivationContextSectionString| |
|7C915D61|ntdll.dll | | |RtlFindCharInUnicodeString | |
|7C910466|ntdll.dll | | |RtlFreeUnicodeString | |
|7C80B87C|kernel32.dll | | |IsDBCSLeadByte | |
|7C9113ED|ntdll.dll | | |RtlDeleteCriticalSection | |
|7C80EEF5|kernel32.dll | | |FindClose | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C912D19|ntdll.dll | | |LdrUnlockLoaderLock | |
|7C9166C1|ntdll.dll | | |LdrGetDllHandleEx | |
|7C9166B3|ntdll.dll | | |LdrGetDllHandle | |
|7C9166A0|ntdll.dll | | |LdrGetDllHandle | |
|7C912A8D|ntdll.dll | | |RtlUnicodeToMultiByteN | |
|7C912C21|ntdll.dll | | |RtlUnicodeStringToAnsiString | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CC9|ntdll.dll | | |LdrLockLoaderLock | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C912D19|ntdll.dll | | |LdrUnlockLoaderLock | |
|7C90CF78|ntdll.dll | | |ZwAllocateVirtualMemory | |
|7C90CF6E|ntdll.dll | | |ZwAllocateVirtualMemory | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
|7C80BA57|kernel32.dll | | |VirtualQueryEx | |
|7C80BA40|kernel32.dll | | |VirtualQueryEx | |
|7C80BA81|kernel32.dll | | |VirtualQuery | |
|7C901000|ntdll.dll | | |RtlEnterCriticalSection | |
|7C912CC9|ntdll.dll | | |LdrLockLoaderLock | |
|7C912CFF|ntdll.dll | | |LdrLockLoaderLock | |
|7C9010E0|ntdll.dll | | |RtlLeaveCriticalSection | |
--------------------------------------------------------------------------------------------
The program was totally idle while I got the error and its window was hidden by other windows. FastMM is active and set to full debug but it indicates no memory overwrite.
Any hints about how to find the origin of this AV?
Win XP, Delphi 7
I don't see an EditWndProc() method in user32.dll, but Delphi has a couple -- one dealing with combobox messages and one dealing with tree views. Given MS's comctrl mess, I'd guess you have a tree view?
Check your tree view stuff. Given IntersectClipRect's parameters, it's easy to guess that it's being passed an invalid device context -- so...are you doing any custom painting for your tree view? If so, are you checking to make sure the canvas handle is ! NIL before you begin painting (try assertions if nothing else)?
I just wonder what's on line 76[16] in Amper.exe... That line number might be a hint of the location of the error.
Then again, when it's just happening during an idle moment then it basically happens when the system is processing Windows messages like the mouse moving, keyboard events, timer updates and a lot more.
It sometimes helps to search for the error message plus code. I've done a quick scan and found this KB from MS which suggests that this kind of error can happen when you call certain Windows API's with invalid parameters. But this KB doesn't apply to your error. Still, it gives you an idea about what to check: any Windows API call you make in your own code.
Does it also generate this exception in the IDE, while you're debugging?
That's what EurekaLog does when it has nothing to work with. You need to rebuild and have the linker produce a detailed map file. That's how it knows what to apply its stack trace to.

Resources