DashDB sync with Cloudant doesn't work - data-synchronization

I had a setup to sync data from Cloudant database into DashDB. Initially the setup and processes were working well. I keep the sync processes running after the setup. A few days later, I inserted a record into my Cloudant database, then I were expecting it being populated at DashDB automatically. But that didn't happen.
When I checked the sync process after above issue, I want to turn my sync process to 'Pause' and then 'Resume' it, a popup window shows me "Initialization in Progress" which blocks me to do anything about it.
Now, my sync processes are hanging in there, and data not being synced at all.
Any suggestions for solving the issue?
Best Regards

Cloudant does a continous transformation to dashdb if warehouse is created, may be the transformation hit an error, you can check if there is an error in the warehouser docs
Take a look inside the document inside the _warehouser database, and look for the warehouser_error_message to see if there is issue with transformation occurred.

Related

FreeRadius accounting altering/updating sessions start times after a day, weeks and in some cases months

This might be a very specific problem or just ignorance from my side, but I don't seem to figure it out.
Within our organization, we have a FreeRadius Accounting system logging sessions from Wi-Fi usage. Our team is responsible for the data analysis of this accounting data.
Recently, we had to dump the Radius Accounting Database and made a freeze frame of it. While doing so we found a weird behavior.
Running the same query before and after the dump (a query that retrieves the total amount of sessions for a single day) gave a different amount. Around a difference of 5-10%.
Looking a bit deeper we discovered that several updates were being issued that altered the start time of sessions after they had been first registered in the accounting database.
We then found that previous data we collected had disparity after weeks or months even (with the discrepancy being around 2-10%).
TLDR:
Does FreeRadius adjust the start times of sessions based on some maintenance? Are WiFi controllers allowed to do this? Is it a bug?
Overal we just want to understand the rationale so we can justify the data and adjust our processing correctly, as currently, we cannot trust the values we collect daily or even weekly on these stats!
Any help or insight would be great!!!
FreeRADIUS only updates the database as a result of data in an incoming RADIUS packet, using the SQL queries in the local configuration. The only real way to understand this is to look at your SQL queries, and incoming requests (via radiusd -X) and see what is making changes to the data. It is possible that the NAS is broken and sending invalid or changing data, or possibly re-using session IDs which overwrite existing records.
It is also possible to configure FreeRADIUS to create a "fake" accounting start entry in the database in post-auth, which will then be updated when the real Start packet arrives. If you are doing this then you should check the values that are being written, and also if the session never starts up (or the Start is lost) then bad things might happen.
But in all circumstances the only solution you really have is to look at the debug output and see what is happening and why data is being written in the way that it is. There is nothing in FreeRADIUS that randomly updates the database without being sent that data from the NAS.

Is it possible to receive a webhook to my app before Heroku Postgres goes read-only?

I have an application that handles some data in memory.
I'd like to close the operations and persist the data into DB so that a reboot wouldn't destroy it.
My app opens some resources in various third parties and it I'd like to close them. After that the app can happily go down and wait until it reboots.
What I found is that Heroku has various webhooks for application deployment state changes and so on. But I couldn't find a way to trigger a webhook before the DB becomes read only.
I would like to have a webhook that tells me that "in 5 minutes PostgreSQL will become read only". And then later the app can reboot and for now it doesn't matter.
Also I couldn't find any info if this is even possible. I couldn't find an email for support as well.
Is there a way to do it? Is it even possible?
(I have an Event-Sourced app that saves event data into DB but persists the data in-memory as it runs. So I don't want to continuously bash all of my state into the DB).
It sounds like there is some amount of confusion with regards to your understanding about the various parts of dyno and database uptime on Heroku.
Firstly, a database going into read-only mode is a very rare event usually associated with a critical failure. Based on what behavior you're seeking and some of your comments, it seems like you may be confusing database state changes with dyno state changes. Dynos (representing the servers for your application runtime), are restarted once per 24 hours roughly and these servers are ephemeral. Thus the memory is blown away. The 'roughly' part accounts for fuzzing so that all of your dynos aren't restarting at the same time which would cause availability issues.
I don't think you actually need a webhook here. Conveniently, shortly before a dyno is due to be cycled (and blow away your memory) it will receive a SIGTERM and be given 30 seconds to clean up after itself. That SIGTERM can be trapped and you can then save your data to the database.

CDC log retention for Informix

Actual Situation:
We use IBM Data Replication (11.4) to replicate Data from an Informix Database to an SQL Server Database.
Now we have an instance with 45 different subscriptions. On the informix side, we have 30 different log files.
The Problem:
When we want to “refresh” all subscriptions at once, we get in trouble that some logs aren’t available anymore, because they were overwritten.
The problem is that these logs were not full to 100 percent, but instead only to approximately 0,5%.
I don’t know when exactly a new log will be created.
Is there any possibility to change the settings, at which time a new logfile will be created? or that a new logfile only will be created when it is full to 100% or something else? Or do you have another solution to that problem at all?
We have found the problem:
The parameter “log_api_switch_log_num_pages” has to be defined. It describes log switching after a refresh.
See details here:
http://www-01.ibm.com/support/docview.wss?uid=swg21997830

How can I get result of Dask compute on a different machine than the one that submitted it?

I am using Dask behind a Django server and the basic setup I have is summarised here: https://github.com/MoonVision/django-dask-demo/ where the Dask client can be found here: https://github.com/MoonVision/django-dask-demo/blob/master/demo/daskmanager/daskmanager.py
I want to be able to separate the saving of a task from the server that submitted it for robustness and scalability. I also would like more detailed information as to the processing status of the task, right now the future status is always pending even if the task is processing. Having a rough estimate of percent complete would also be great.
Right now, if the web server were to die, the client would get deleted and the task would stop as no client is still holding the future. I can get around this by using fire_and_forget but I then have no way to save the task status and result when it completes.
Ways I see to track the status and save the result after a fire_and_forget:
I could have a scheduler plugin that sends all transfers to AMPQ server (RabbitMQ). I like the robustness and being able to subscribe to certain messages that are output by the scheduler and knowing every message will be processed. I'm not sure how I could get the result it self with this method. I could manually adding a node to the end of every graph to save the result but would rather have it be behind the scenes.
get_task_stream on separate server or use it in some way. With this, it seems I could miss some messages if the server were to go down so seems like a worse option 1.
Other option?
What would be the best way to accomplish this?
Edit: Just tested and it seems when the client that submitted a task shuts down, all futures it created are moved from processing to forgotten, even if calling fire_and_forget.
You probably want to look at Dask's coordination primitivies like Queues and Pub/Sub. My guess is that putting your futures into a queue would solve your problem.
https://docs.dask.org/en/latest/futures.html#coordination-primitives

neo4j is storing arbitrary files in drive C?

my C Drive size is growing and my server is not running any thing but neo4j.
even though i configured neo4j to store database information on some other drive.
node count might be irrelevant but for the record, i have almost 10 million nodes and traffic to database about 200 request / minute.
is there any thing else written by neo4j that i should be aware of?
dbms.directories.data=E:/MyNeoDB4/
dbms.directories.logs=E:/MyNeoDb4
dbms.jvm.additional=-Dunsupported.dbms.udc.source=zip
dbms.memory.heap.initial_size=15
dbms.memory.heap.max_size=15G
dbms.security.procedures.unrestricted=apoc.*
dbms.memory.pagecache.size=8G
Update 1:
things i have checked already:
my debug log is being written some where other than Drive C
metrics.enabled=false
Update 2:
- as #InverseFalcon said i also checked transaction logs in the first step. they were being written in some other directory.
(Note: Answer was written before original question was updated to say that neither metrics nor logs were the likely culprits)
Logs, and possibly metrics
I'm not sure what your logging needs have been like, but a major source of disk consumption that is not the data itself is the writing of log files. They typically do not grow extremely quickly, but it totally depends on your set up.
I suspect that your drive may be filling up with logs, although I am surprised it's filling up so quickly. I would check out your log files and see if they are full of long chains of exceptions.
It could also be metrics being exported to CSV on the local disk, although I do not believe that Neo4J will do that without being explicitly configured to do so.
More info on metrics is at the official docs:
https://neo4j.com/docs/operations-manual/current/monitoring/metrics/
A variant on Rebecca Nelson's answer, you might want to check for transaction log files.
Transaction logs are the source of truth for changes made to a database, and they are not the same kinds of logs as the readable log files (debug.log, neo4j.log) that live in the logs folder.
You can find transaction logs in your graph.db folder (or whatever name you've given to your graph database folder) using the naming pattern neostore.transaction.db.0 (with incremental numbering of the log files starting with 0).
Transaction logs are a stage of data persistence. Transactions affecting the database first write to these logs. When criteria are met, a checkpoint operation occurs which flushes the contents of the transaction logs to the datastore files (some of the other files in the graph.db folder) and the transaction logs are pruned and/or rotated.
While you should not modify or delete transaction log files yourself, you can add configuration parameters in neo4j.conf to control how these files are handled.
Here are the docs dealing with transaction logs.

Resources