Dynamically load config file in Emqttd - erlang

Load Config File Dynamically in Emqttd.
Can I make changes to emq.conf file and anyhow load it without stopping the broker ?

I think the better way to use "gen_server" with ETS table to store/update data periodically (for example every 30 seconds).
you may use erlang:send_after inside the handle_info function & read Emqttd config file.
check the answer in this topic What's the best way to do something periodically in Erlang?

Related

is there any way I can avoid reading old files from old folder with Apache Beam's TextIo watchForNewFiles(Duration, condition)?

Use Case: During dataflow job start up we should provide initial file name to read data and later on it should watch for new files in that directory and it should consider all remaining old files as already read.
Issues:
Approach 1:
PCollection<String> readfile = pipeline.apply(TextIO.read().from("gs://folder-Name/*").
watchForNewFiles(Duration.standardSeconds(10),
Watch.Growth.afterTimeSinceNewOutput(Duration.standardSeconds(30))));
If we are using like this its considering old files as new files for this dataflow job and reading all those files in that folder
Approach 2:
PCollection<String> readfile = pipeline.apply(TextIO.read().from("gs://folder-Name/file-name").
watchForNewFiles(Duration.standardSeconds(10),
Watch.Growth.afterTimeSinceNewOutput(Duration.standardSeconds(30))));
Its reading only this particular file and not able to read upcoming new files
can anyone please suggest the approach to achieve my use case?
The watchForNewFiles() function will always read all files matching the filepattern, both existing and new. In your second approach, the file pattern is only one file, so you just get that.
However, you can use the lower-level building block transforms in FileIO to accomplish what you need. The following code will just read files written after the pipeline starts:
PCollection<String> lines = p
.apply(FileIO.match().filepattern("gs://folder-Name/*")
.continuously(Duration.standardSeconds(30), afterTimeSinceNewOutput(Duration.standardHours(1)))
.setCoder(MetadataCoderV2.of())
.apply(Filter.by(metadata -> metadata.lastModifiedMillis() > PIPELINE_START))
.apply(FileIO.readMatches())
.apply(apply(TextIO.readFiles()))
You can change the details of the Filter transform to whatever precise condition you need. To also include specific older files, you can read those with a standard TextIO.read().from(...) and then use Flatten to combine that PCollection with the continuous set. Like this:
PCollection allLines =
PCollectionList.of(lines).and(p.apply(TextIO.read().from("gs://folder-Name/file-name)
.apply(Flatten.pCollections())
Maybe you need to clarify your Use Case, do you provide a file name to read ? or a file pattern ? What is the number of files expected ? Should you really use a Dataflow streaming pipeline ? Doesn't a Cloud Function answer your need ? What is your issue ? Files get read again when you restart your pipeline ?
You can, as suggested by danielm use FileIO to fetch and filter on file metadata in order to know which file was added after the pipeline began.
If you provide a file pattern, then all file will be read once by the pipeline. There's no way to keep a State between pipelines if you not code it yourself, so when you restart the pipeline you will read again all the file matching the pattern.
If you want to avoid that, you can manually move old files to another path between stopping the old pipeline and starting a new one.
You could also consider is consuming GCS notification on file creation with PubsubIO and use this event to know which file to treat in your pipeline.
A good practice though is to have multiple folders that reflects the status of the files:
input
processing
failed
succeed
This way you know the state of each file. You can put files to treat in the input folder, and inside your pipeline move the file to its corresponding state folder.

Symfony 1: Dynamic database connection configuration

sorry to trouble you this much. I just want to know that is it possible to have a dynamic data in the database.yml file. This is to be able shift my connection whenever i want to to other database. If it is possible, can you please elaborate how its done, what is the involve processes and why it is needed.
I would say it is not possible to have something like variables inside the databases.yml files without using dirty tricks on the command line etc. But you can define multiple databases (as for testing) or you can load the databases.yml at runtime,
$file = sfConfig::get('sf_config_dir').'/databases.yml';
$content = file_get_contents($file);
and change it with a preg_replace for the dsn. This should work but I would really not use it.

Does Ruby on Rails have read stream for files?

Does rails have a way to implement read streams like Node js for file reading?
i.e.
fs.createReadStream(__dirname + '/data.txt');
as apposed to
fs.readFile(__dirname + '/data.txt');
Where I see ruby has
file = File.new("data.txt")
I am unsure of the equivalent in ruby/rails for creating a stream and would like to know if this is possible. The reasons I ask is for memory management as a stream will be delivered piece by piece as apposed to one whole file.
If you want to read a file in Ruby piece-by-piece, there are a host of methods available to you.
IO#each_line/IO::foreach, also implemented in File to iterate over each line of the file. Neither reads the whole file into memory; instead, both simply read up until the next newline, return, and pause reading, barring a possible buffer.
IO#read/IO::read takes a length parameter, which allows you to specify for it to read up to length bytes from the file. This will only read that many, and not the whole thing.
IO::binread does the same as IO::read, but will open the file in binary mode.
IO#readpartial appears to be very similar or identical to IO#read, but is also worth looking at.
IO#getc and IO#gets both read from the file until they reach the end of what they'll return, as far as I can tell.
There are several more that I'm looking for right now.

ChromeWorker to write a huge file

In my extension, I need to write a huge file (say around 20 gigs) to the disk. Currently I am doing it in the main thread, but file creation is very expensive operation. I was about to move the whole file creation process to a ChromeWorker, but based on https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Functions_and_classes_available_to_workers I cannot have access to the nsiFile from a ChromeWorker.
So my questions are:
1. Is it possible to access Cc, Ci, and Cu from within a ChromeWorker?
2. If not what would be the most efficient way to create and fill large files in Firefox. Note that I need to write the file based on segments and offsets (Ci.nsISeekableStream).
It's not possible to access nsIFile from ChromeWorker. But nsIFile is horrible synchronus option.
Go with OS.File: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm
On that page go to the link for usage on workers: https://developer.mozilla.org/docs/Mozilla/JavaScript_code_modules/OSFile.jsm/OS.File_for_workers
On the mainthread os.file returns promises.
In worker they are synchronus. Wrap your os.file functions in worker with a try-catch, as when an error occurs, (like os.file.remove with option of ignoreAbsent set to false) then the catch will hold the OS.File.Error object.
Great move to ChromeWorker btw! I'm a huge fan of ChromeWorkers. I wrote a simple example of jsm using chromeworker here: https://github.com/Noitidart/jpm-chromeworker
For segments, you'll have to OS.File.open and then on the return value do a .setPosition() then you can read certain number of bytes from that position, or write, or whatever. Its awesome stuff. OS.File is the new way and the recommended way to do file operations. Its been around awhile now though since like Firefox 29 or before that.

HDFS Flume sink - Roll by File

Is it possible for HDFS Flume sink to roll whenever a single file (from a Flume source, say Spooling Directory) ends, instead of rolling after certain bytes (hdfs.rollSize), time (hdfs.rollInterval), or events (hdfs.rollCount)?
Can Flume be configured so that a single file is a single event?
Thanks for your input.
Reagarding your first question, it is not possible due to the sinks logic is disconnected from the sources logic. I mean, a sink only sees events being put into the channel which must be processed by him; the sink does not know if an event is the first or the last regarding a file.
Of course, you could try to create your own source (or extend an existing one) in order to add a header to the event with a value meaning "this is the last event". Then, another custom sink could behave depending on such a header: for instance, if the header is not set, then the events are not persisted but stored in memory until the header is seen; then all the information is persisted in the final backend as a bach. Other possibility is that custom sink persists the data in a file until the header is seen; then the file is closed and another one is opened.
Regarding your second question, it depends on the sink. The spooldir source behaves based on the deserializer parameter; by default its value is LINE, what means:
Specify the deserializer used to parse the file into events. Defaults to parsing each line as an event. The class specified must implement EventDeserializer.Builder.
But other custom Java classes can be configured, as said above; for instance, a deserialized for the whole file.
You can set rollsize to a small number combined with BlobDeserializer to load file by file instead of combining into blocks. This is really helpful when you have unsplittable binary files such as PDF or gz files.
This is part of the configuration that is relevant:
#Set deserializer to BlobDeserializer and set the maximum blob size to be 1GB.
#Notice that the blobs have to fit in memory so this doesn't work for files that cannot fit in memory.
agent.sources.spool.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.spool.deserializer.maxBlobLength = 1000000000
#Set rollSize to 1024 to avoid combining multiple small files into one part.
agent.sinks.hdfsSink.hdfs.rollSize = 1024
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.hdfs.rollInterval = 0
The answer to the question "Can Flume be configured so that a single file is a single event?" is yes.
Yo only have to configure the following property to be 1:
hdfs.rollCount = 1
I'm looking for a solution for your first question, because sometimes the file is too big and it's needed to split the file in several chunks.
You can use any event headers in hdfs.path. ( https://flume.apache.org/FlumeUserGuide.html#hdfs-sink )
If you are using Spooling Directory Source, you can enable putting the file name in the events using fileHeaderKey or basenameHeaderKey ( https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source ).
Can Flume be configured so that a single file is a single event?
It could be, however it is not recommended. The underlying implementation (protobuf) limits file sizes to 64m. Flume events are to be small in size due to its architecture and design. (Fault-tolerance, etc.)

Resources