In a dotnet core app, if one configures mssql as a sink via appsettings, how can you configure a "backup" sink such as a file? So if serilog cant write to a given sink, it tries to write to the 2nd.
Is the Selflog what I need to look at?
Serilog doesn't have a feature like that, as of this writing, but it should be possible to implement by creating your own Sink that wraps the primary sinks and manages the error handling to perform the fallback to "backup" sink(s).
You might want to look at the code of Serilog.Sinks.Async for inspiration, as it shows you a way of wrapping multiple sinks into one which could be a starting point.
Related
AuditTo is a Serilog feature that ensures synchronous write to the sink, with an exception thrown if change flushing fails.As the name implies, it is ideal for ensuring the security of audit data to be stored. Right now, I found File, Seq and RabbitMQ sinks supporting AuditTo writes. I couldn't find the SqlLite sink that I'm interested in ... :(
From the other side, we have WriteTo, which batches the log entries and writes them asynchronously. There are no exceptions; it's kind of fire and forget.No one cares whether the log entries are dropped by the connection or the target system's failure or unavailability.
I would like to implement sending the audit logs via AuditTo but also be able to switch the log configuration to WriteTo at the runtime. In the meantime, the app might still write the logs.
I saw that Serilog offers dynamic switching of the logging level via LoggingLevelSwitch.
Any suggestions, ideas, or solutions for such requirements?
So I am working on a little project that sets up a streaming pipeline using Google Dataflow and apache beam. I went through some tutorials and was able to get a pipeline up and running streaming into BigQuery, but I am going to want to Stream it into a full relational DB(ie: Cloud SQL). I have searched through this site and throughout google and it seems that the best route to achieve that would be to use the JdbcIO. I am a bit confused here because when I am looking up info on how to do this it all refers to writing to cloud SQL in batches and not full out streaming.
My simple question is can I stream data directly into Cloud SQL or would I have to send it via batch instead.
Cheers!
You should use JdbcIO - it does what you want, and it makes no assumption about whether its input PCollection is bounded or unbounded, so you can use it in any pipeline and with any Beam runner; the Dataflow Streaming Runner is no exception to that.
In case your question is prompted by reading its source code and seeing the word "batching": it simply means that for efficiency, it writes multiple records per database call - the overloaded use of the word "batch" can be confusing, but here it simply means that it tries to avoid the overhead of doing an expensive database call for every single record.
In practice, the number of records written per call is at most 1000 by default, but in general depends on how the particular runner chooses to execute this particular pipeline on this particular data at this particular moment, and can be less than that.
We have a streaming Dataflow pipeline running on Google Cloud Dataflow
workers, which needs to read from a PubSub subscription, group
messages, and write them to BigQuery. The built-in BigQuery Sink does
not fit our needs as we need to target specific datasets and tables
for each group. As the custom sinks are not supported for streaming
pipelines, it seems like the only solution is to perform the insert
operations in a ParDo. Something like this:
Is there any known issue with not having a sink in a pipeline, or anything to be aware of when writing this kind of pipeline?
There should not be any issues for writing a pipeline without a sink. In fact, a sink is a type of ParDo in streaming.
I recommend that you use a custom ParDo and use the BigQuery API with your custom logic. Here is the definition of the BigQuerySink, you can use this code as a starting point.
You can define your own DoFn similar to StreamingWriteFn to add your custom ParDo logic, which will write to the appropriate BigQuery dataset/table.
Note that this is using Reshuffle instead of GroupByKey. I recommend that you use Reshuffle, which will also group by key, but avoid unnecessary windowing delays. In this case it means that the elements should be written out as soon as they come in, without extra buffering/delay. Additionally, this allows you to determine BQ table names at runtime.
Edit: I do not recommend using the built in BigQuerySink to write to different tables. This suggestion is to use the BigQuery API in your custom DoFn, rather than using the BigQuerySink
Not sure whether this is the right place to ask but I am currently trying to run a dataflow job that will partition a data source to multiple chunks in multiple places. However I feel that if I try to write to too many table at once in one job, it is more likely for the dataflow job to fail on a HTTP transport Exception error, and I assume there is some bound one how many I/O in terms of source and sink I could wrap into one job?
To avoid this scenario, the best solution I can think of is to split this one job into multiple dataflow jobs, however for which it will mean that I will need to process same data source multiple times (once for which dataflow job). It is okay for now but ideally I sort of want to avoid it if later if my data source grow huge.
Therefore I am wondering there is any rule of thumb of how many data source and sink I can group into one steady job? And is there any other better solution for my use case?
From the Dataflow service description of structuring user code:
The Dataflow service is fault-tolerant, and may retry your code multiple times in the case of worker issues. The Dataflow service may create backup copies of your code, and can have issues with manual side effects (such as if your code relies upon or creates temporary files with non-unique names).
In general, Dataflow should be relatively resilient. You can Partition your data based on the location you would like it output. The writes to these output locations will be automatically divided into bundles, and any bundle which fails to get written will be retried.
If the location you want to write to is not already supported you can look at writing a custom sink. The docs there describe how to do so in a way that is fault tolerant.
There is a bound on how many sources and sinks you can have in a single job. Do you have any details on how many you expect to use? If it exceeds the limit, there are also ways to use a single custom sink instead of several sinks, depending on your needs.
If you have more questions, feel free to comment. In addition to knowing more about what you're looking to do, it would help to know if you're planning on running this as a Batch or Streaming job.
Our solution to this was to write a custom GCS sink that supports partitions. Though with the responses I got I'm unsure whether that was the right thing to do or not. Writing Output of a Dataflow Pipeline to a Partitioned Destination
I am using Parse.com as the backend for my iOS app. Parse has a big Export Data button for backing up your database that will send an email with a zip containing each table and its data in JSON format. That's great, but is there any way to automate this task? I want to be able to do this every night, and I know you can use Background Jobs for automated tasks, but is it possible to hook into this particular feature? I couldn't find an answer on Parse's forums and it didn't turn up anything except old threads talking about how this feature was on the horizon.
The best I can work out, without Parse providing a true way of achieving this, is to have a job creating File objects in a "backup" table. And then use an external service (with the REST API) to pull this out into S3 or similar.
It's not ideal, but it would work. Also, it will count against your API requests so you may want to optimise with the updated flag.
What I do for this issue is I am running a simple Windows Server in the AWS EC2 to run a scheduled task.
Create simple bat file to run a command node parse-backup.js
Create basic scheduled task using windows provided scheduler and run bat file
You can use this node code. https://github.com/mkim871/parse-node-backup