Does Erlang's ssh_sftp library provide for a way to listen for directory changes? - erlang

I need to watch or listen to a folder on an SFTP server for any changes. At some given time in the future (I don't know when), the folder will be updated with a file. Instead of pinging every minute, how would I setup a listener or watcher on that folder so I know when it has that file? Does Erlang's ssh_sftp module provide a function for this?

Neither SFTP nor FTP protocol have any mechanism to notify a client about changes in a remote folder. The only solution to detect changes is to periodically enumerate remote directory tree and find differences.
Ref : https://winscp.net/eng/docs/library_example_watch_for_changes

I don't know whether you still need it or not but I have already implemented what you need. I won't be able to share the code due to legal agreements however I can hint you the way to implement it.
Requirement: Run a function whenever a file lands to a directory instead of polling the directory at a defined interval to make processing efficient.
Step 1: Start the ssh daemon process with sftp as a subsystem
{ok, Pid} = ssh:daemon(Port, [{user_passwords, [{User, Pass}]},
{system_dir, SystemDir},
{user_dir, UserDir},
{key_cb, cac_auth},
{shell, {cacsftpd_server, display_info, []}},
{subsystems, [ssh_sftpd:subsystem_spec([{cwd,
Home}, {file_handler, sftpd_file_handler}])]}])
Step 2: In sftpd_file_handler you can apply the trigger to call ur desired handler once the file is completely transferred ;)).

While not in Erlang, but rather Elixir, take a look at: https://github.com/Codenaut/exsftpd/blob/master/lib/sftpd_file_handler.ex
The essential bit is the write function:
defmodule Exsftpd.SftpFileHandler do
...
def write(io_device, data, state) do
{:file.write(io_device, data), state}
my_custom_function(state)
{:ok, state}
end
...
end
Here you can call whatever function you desire after (or instead of) writing the content to the filesystem.
Initialise the server like so: (see https://github.com/Codenaut/exsftpd/blob/master/lib/server.ex for a more elaborate example):
:ssh.daemon(2222,
system_dir: '/tmp/ssh',
subsystems: [
Exsftpd.SftpdChannel.subsystem_spec(
file_handler: {Exsftpd.SftpFileHandler, []}
)
],
user_dir: '/tmp/ssh/users'
)

Related

Best way to log access to web pages

One of my website is using Nitrogen with a Cowboy server.
I would like to log every access to web pages just like Apache does with access.log.
What would be the best way to do that ?
You can use cowboy middlewares https://ninenines.eu/docs/en/cowboy/1.0/guide/middlewares/
Just create a simple log module:
-module(app_web_log).
-behaviour(cowboy_middleware).
-export([execute/2]).
execute(Req, Env) ->
{{Peer, _}, Req2} = cowboy_req:peer(Req),
{Method, Req3} = cowboy_req:method(Req2),
{Path, Req4} = cowboy_req:path(Req3),
error_logger:info_msg("~p: [~p]: ~p ~p", [calendar:universal_time(), Peer, Method, Path]),
{ok, Req4, Env}.
and add it in list of middlwares:
{ok, _} = cowboy:start_http(http, 100, [{port, 8080}], [
{env, [{dispatch, Dispatch}]},
{middlewares, [cowboy_router, app_web_log, cowboy_handler]}]).
Try using Nitrogen on top of the Yaws web server instead, since it performs access logging by default.
Each underlying webserver does it differently (or not at all) - this is something simple_bridge does not yet have abstracted.
So in the case of cowboy, you'll likely have to rig it up yourself.
If you're using a newer build of Nitrogen (if you have the file site/src/nitrogen_main_handler.erl), then you can edit that file to manually log yourself. For example, using erlang's error handler, you could add something simple like:
log_request() ->
error_logger:info_msg("~p: [~p]: ~p", [{date(), time()}, wf:peer_ip(), wf:url()]).
run() ->
handlers(),
log_request(), %% <--- insert before wf_core:run()
wf_core:run().
Then whatever happens with the log can be handled by configuring error_logger to write to disk (http://erldocs.com/17.0/kernel/error_logger.html?i=13&search=error_logger#logfile/1)
If you use an older Nitrogen (which would have site/src/nitrogen_cowboy.erl), then you would similarly edit that file, once again before the wf_core:run() call.
Alternatively, your hooks option with cowboy could work as well. I've not worked with them, so you're on your own there :)

Pass some arguments to supervisor init function when app starts

I want to pass some arguments to supervisor:init/1 function and it is desirable, that the application's interface was so:
redis_pool:start() % start all instances
redis_pool:start(Names) % start only given instances
Here is the application:
-module(redis_pool).
-behaviour(application).
...
start() -> % start without params
application:ensure_started(?APP_NAME, transient).
start(Names) -> % start with some params
% I want to pass Names to supervisor init function
% in order to do that I have to bypass application:ensure_started
% which is not GOOD :(
application:load(?APP_NAME),
case start(normal, [Names]) of
{ok, _Pid} -> ok;
{error, {already_started, _Pid}} -> ok
end.
start(_StartType, StartArgs) ->
redis_pool_sup:start_link(StartArgs).
Here is the supervisor:
init([]) ->
{ok, Config} = get_config(),
Names = proplists:get_keys(Config),
init([Names]);
init([Names]) ->
{ok, Config} = get_config(),
PoolSpecs = lists:map(fun(Name) ->
PoolName = pool_utils:name_for(Name),
{[Host, Port, Db], PoolSize} = proplists:get_value(Name, Config),
PoolArgs = [{name, {local, PoolName}},
{worker_module, eredis},
{size, PoolSize},
{max_overflow, 0}],
poolboy:child_spec(PoolName, PoolArgs, [Host, Port, Db])
end, Names),
{ok, {{one_for_one, 10000, 1}, PoolSpecs}}.
As you can see, current implementation is ugly and may be buggy. The question is how I can pass some arguments and start application and supervisor (with params who were given to start/1) ?
One option is to start application and run redis pools in two separate phases.
redis_pool:start(),
redis_pool:run([] | Names).
But what if I want to run supervisor children (redis pool) when my app starts?
Thank you.
The application callback Module:start/2 is not an API to call in order to start the application. It is called when the application is started by application:start/1,2. This means that overloading it to provide differing parameters is probably the wrong thing to do.
In particular, application:start will be called directly if someone adds your application as a dependency of theirs (in the foo.app file). At this point, they have no control over the parameters, since they come from your .app file, in the {mod, {Mod, Args}} term.
Some possible solutions:
Application Configuration File
Require that the parameters be in the application configuration file; you can retrieve them with application:get_env/2,3.
Don't start a supervisor
This means one of two things: becoming a library application (removing the {mod, Mod} term from your .app file) -- you don't need an application behaviour; or starting a dummy supervisor that does nothing.
Then, when someone wants to use your library, they can call an API to create the pool supervisor, and graft it into their supervision tree. This is what poolboy does with poolboy:child_spec.
Or, your application-level supervisor can be a normal supervisor, with no children by default, and you can provide an API to start children of that, via supervisor:start_child. This is (more or less) what cowboy does.
You can pass arguments in the AppDescr argument to application:load/1 (though its a mighty big tuple already...) as {mod, {Module, StartArgs}} according to the docs ("according to the docs" as in, I don't recall doing it this way myself, ever: http://www.erlang.org/doc/apps/kernel/application.html#load-1).
application:load({application, some_app, {mod, {Module, [Stuff]}}})
Without knowing anything about the internals of the application you're starting, its hard to say which way is best, but a common way to do this is to start up the application and then send it a message containing the data you want it to know.
You could make receipt of the message form tell the application to go through a configuration assertion procedure, so that the same message you send on startup is also the same sort of thing you would send it to reconfigure it on the fly. I find this more useful than one-shotting arguments on startup.
In any case, it is usually better to think in terms of starting something, then asking it to do something for you, than to try telling it everything in init parameters. This can be as simple as having it start up and wait for some message that will tell the listener to then spin up the supervisor the way you're trying to here -- isolated one step from the application inclusion issues RL mentioned in his answer.

Default process flags

Is there a way to instruct the Erlang VM to apply a set of process flags to every new process that is spawned in the system?
For example in testing environment I would like every process to have save_calls flag set.
One way for doing this is to combine the Erlang tracing functionalities with a .erlang file.
Specifically, you could either use the low-level tracing capabilities provided by erlang:trace/3 or you could simply exploit the dbg:tracer/2 function to create a new tracing process which executes your custom handler function every time a tracing message is received.
To automate things a bit, you could then create an Erlang Start Up File in the directory where you're running your code or in your home directory. The Erlang Start Up File is a special file, called .erlang, which gets executed every time you start the run-time system.
Something like the following should do the job:
% -*- Erlang -*-
erlang:display("This is automatically executed.").
dbg:tracer(process, {fun ({trace, Pid, spawn, Pid2, {M, F, Args}}, Data) ->
process_flag(Pid2, save_calls, Data),
Data;
(_Trace, Data) ->
Data
end, 100}).
dbg:p(new, [procs, sos]).
Basically, I'm creating a new tracing process, which will trace processes (first argument). I'm specifying an handler function to get executed and some initial data. In the handler function, I'm setting the save_calls flag for newly spawned processes, whilst I'm ignoring all other tracing messages. I set the save_calls' option to 100, using the Initial Data parameter. In the last call, I'm telling dbg that I'm interested only in newly created processes. I'm also setting the sos (set_on_spawn) option to ensure inheritance of the tracing flags.
Finally, note how you need to use a variant of the process_flag function, which takes an extra argument (the Pid of the process you want to set the flag for).

Yaws process died

I do performance testing of our Erlang application with OpenSTA. The test runs with 100 virtual users. At some point the following errors start popping up:
Yaws process died: {{badmatch,{error,eacces}},
[{yaws_server,ut_read,1},
{yaws_server,deliver_dyn_file,5},
{yaws_server,aloop,3},
{yaws_server,acceptor0,2},
{proc_lib,init_p_do_apply,3}]}
The test continues to run. I cannot find info about this error. Does eacces mean Error accessing a resource?
EDIT: As pointed out by #Muzaaya Joshua the call file:read_file(UT#urltype.fullpath) crashes in function ut_read(UT). I recompiled the module and printed the context. The error is eacces and UT holds:
{urltype,yaws,
{file_info,14088,regular,read_write,
{{2011,9,13},{11,51,42}},
{{2011,10,17},{17,59,44}},
{{2011,3,16},{13,18,58}},
33206,1,3,0,0,0,0},
"/handler.yaws",
"c:/Temp/harmony/script/../www/handler.yaws",
"/",undefined,undefined,"text/html",
"/handler.yaws",undefined}
This file handler.yaws is the entry point of our app and is called on every request. When I run the test with 100 or less virtual users I don't see these errors. So how can it be Missing permission for reading the file, or for searching one of the parent directories. as the error is described in the read_file documentation?
Thanks in advance.
Martin
eacces means access denied, the error codes are described at the end of the file documenation: http://www.erlang.org/doc/man/file.html#write_file_info-2
Yaws has failed to open a file. This is a file read permission denied to the user running yaws application. If you add permissions to the user running yaws to own all folders concerning yaws, this may be fixed. to check it out, try running yaws as root , if at all these paths are owned by root. The yaws_server.erl source file at the point is as follows:
ut_read(UT) ->
?Debug("ut_read() UT.fullpath = ~p~n", [UT#urltype.fullpath]),
case yaws:outh_get_content_encoding() of
identity ->
case UT#urltype.data of
undefined ->
?Debug("ut_read reading\n",[]),
{ok, Bin} = file:read_file(UT#urltype.fullpath),
?Debug("ut_read read ~p\n",[size(Bin)]),
Bin;
B when is_binary(B) ->
B
end;
deflate ->
case UT#urltype.deflate of
B when is_binary(B) ->
?Debug("ut_read using deflated binary of size ~p~n",
[size(B)]),
B
end
end.
The line in bold in the above source is where the bad match occurs.
Check wether yaws is running as a user with permissions to access its folders such as the doc root, ssl folders and other paths that yaws may access files. Does the user running yaws have access to all required files ?
In windows, this makes it easy. Now go to your Local disk C, Program Files (In windows 7, could be Program Files (x86)), lastly to where yaws is installed. This will be a folder: Yaws-VERSION e.g. Yaws-1.89. Now, when you right click this, you select Properties, then in the pop-up window you select Security. Under Security, you click edit. Now under Group or usernames, you select each user (and each type of account) and give it all permissions i.e. read, write, full control e.t.c. click Apply, wait for windows to perform the changes, then click Ok and close. Your users must now have all required permissions.
I managed to fix this errors by increasing the file size of the allowed in cache files in the YAWS configuration max_size_cached_file. This made our .yaws file to be loaded in memory and not accessess with file:read_file all the time. Hopefully this will save someone else couple of hours (or days :))

Spawn remote process w/o common file system

(nodeA#foo.hyd.com)8> spawn(nodeA#bar.del.com, tut, test, [hello, 5]).
I want to spawn a process on bar.del.com which has no file system access to foo.hyd.com (from where I am spawning the process), running subroutine "test" of module "tut".
Is there a way to do so, w/o providing the nodeA#bar.del.com with the compiled "tut" module file?
You can use the following function to load a module at remote node without providing the file itself:
load_module(Node, Module) ->
{_Module, Bin, Filename} = code:get_object_code(Module),
rpc:call(Node, code, load_binary, [Module, Filename, Bin]).
As noted in code:load_binary/3 Filename argument is used only to track the path to module and the file it points to is not used by local node_server.
I'm interpreting your question as a desire to not copy the *.beams from your filesystem to the remote file system.
If you are just testing things you can use the nl(Mod) call in the erl shell to load the module on all (currently) known nodes. That is, those that show up in nodes().
This will send the code over and load it from the memory copy, it will not store it in the remote filesystem.
You can also start the remote node using the slave module. A slave accesses its master's filesystem and code server. Ordinary auto-loading will then make sure the module exists in the slave when you call your test:tut/2 function.
You can send the local code to the remote node:
> {Mod, Bin, File} = code:get_object_code(Module).
> rpc:call(RemoteNode, code, load_binary, [Mod, File, Bin]).

Resources