How to rewrite URL in Yaws/Erlang - erlang

How can I access yaws file without including it's extension? Say,
www.domain.com/listen.yaws => www.domain.com/listen
I could not find any specific documentation for this from yaws documentation/appmod.
I think the question is ultimately clarified!

You can find one example of how to accomplish this in the "Arg Rewrite" section (7.1.2) of the Yaws PDF documentation. Set the variable arg_rewrite_mod in your server configuration to the name of an Erlang module supporting rewriting:
arg_rewrite_mod = my_rewriter
To support rewriting, the my_rewriter module must define and export an arg_rewrite/1 function, taking an #arg{} record as its argument:
-module(my_rewriter).
-export([arg_rewrite/1]).
-include_lib("yaws/include/yaws_api.hrl").
rewrite_pages() ->
["/listen"].
arg_rewrite(Arg) ->
Req = Arg#arg.req,
{abs_path, Path} = Req#http_request.path,
case lists:member(Path, rewrite_pages()) of
true ->
Arg#arg{req = Req#http_request{path = {abs_path, Path++".yaws"}}};
false ->
Arg
end.
The code includes yaws_api.hrl to pick up the #arg{} record definition.
The rewrite_pages/0 function returns a list of pages that must be rewritten to include ".yaws" suffixes; in this example, it's just the /listen page you mention in your question. If in arg_rewrite/1 we find the requested page in that list, we append ".yaws" to the page name and include it in a new #arg{} we return to Yaws, which then continues dispatching the request based on the new #arg{}.

Related

str() is not usable anymore to get true value of a Text tfx.data_types.RuntimeParameter during pipeline execution

how to get string as true value of tfx.orchestration.data_types.RuntimeParameter during execution pipeline?
Hi,
I'm defining a runtime parameter like data_root = tfx.orchestration.data_types.RuntimeParameter(name='data-root', ptype=str) for a base path, from which I define many subfolders for various components like str(data_root)+'/model' for model serving path in tfx.components.Pusher().
It was working like a charm before I moved to tfx==1.12.0: str(data_root) is now providing a json dump.
To overcome that, i tried to define a runtime parameter for model path like model_root = tfx.orchestration.data_types.RuntimeParameter(name='model-root', ptype=str) and then feed the Pusher component the way I saw in many tutotrials:
pusher = Pusher(model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=tfx.proto.PushDestination(
filesystem=tfx.proto.PushDestination.Filesystem(base_directory=model_root)))
but I get a TypeError saying tfx.proto.PushDestination.Filesystem does not accept Runtime parameter.
It completely breaks the existing setup as i received those parameters from external client for each kubeflow run.
Thanks a lot for any help.
I was able to fix it.
First of all, the docstring is not clear regarding which parameter of Pusher can be a RuntimeParameter or not.
I finally went to __init__ code definition of component Pusher to see that only the parameter push_destination can be a RuntimeParameter:
def __init__(
self,
model: Optional[types.BaseChannel] = None,
model_blessing: Optional[types.BaseChannel] = None,
infra_blessing: Optional[types.BaseChannel] = None,
push_destination: Optional[Union[pusher_pb2.PushDestination,
data_types.RuntimeParameter]] = None,
custom_config: Optional[Dict[str, Any]] = None,
custom_executor_spec: Optional[executor_spec.ExecutorSpec] = None):
Then I defined the component consequently, using my RuntimeParameter
model_root = tfx.orchestration.data_types.RuntimeParameter(name='model-serving-location', ptype=str)
pusher = Pusher(model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=model_root)
As push_destination parameter is supposed to be message proto tfx.proto.pusher_pb2.PushDestination, you have then to respect the associated schema when instantiating and running a pipeline execution, meaning the value should be like:
{'type': 'model-serving-location': 'value': '{"filesystem": {"base_directory": "path/to/model/serving/for/the/run"}}'}
Regards

Is it possible to add YAWS appmods config at runtime?

I embedded YAWS in my application at production environment, and I use the function yaws:start_embedded/4 to start YAWS.
Below is my code:
Id = "my_server",
GconfList = [{logdir, "./log"}, {id, Id}],
SconfList = [{docroot, Docroot},
{port, Port},
{listen, Listen},
{appmods, [
{"/rest", mod_rest, []},
{"/file", mod_file, []}
]}
],
yaws:start_embedded(Docroot, SconfList, GconfList, Id).
I'd like to add another appmod, e.g: {"/upload", mod_upload, []}
Is it possible to add appmods at runtime without restarting YAWS?
You can add appmods at runtime by first retrieving the current configuration, using it to create a new configuration containing your new appmods, and then setting the new configuration.
Call yaws_api:getconf/0 to get a 3-tuple {ok, GlobalConf, ServerConfs} where GlobalConf is the global Yaws configuration and ServerConfs is a list of lists of Yaws server configurations. The global conf is a record type named gconf, and the server conf is a record type named sconf; both of these record types are defined in the yaws.hrl header file.
Work through the server configurations to find the one containing the appmods you want to change. This is slightly tricky because you're dealing with a list of lists, and you need to keep the shape of the overall data structure unchanged.
Once you find the sconf, create a new sconf instance from it by adding your new appmod to its current list of appmods. Each element of the appmod list is a tuple consisting of a URL path for the appmod and the name of the appmod module. An appmod tuple can also optionally contain a third field consisting of a list of paths under the first path to be excluded; see the description of exclude_paths in the Yaws appmod documentation for more details.
Replace the existing sconf value in ServerConfs with your new value.
Call yaws_api:setconf/2 to set the new configuration, passing the existing GlobalConf as the first argument and the new ServerConfs containing your new sconf as the second argument.
The am_extend module below shows how to do this. It exports an add/1 function that takes a function that can identify and augment the appmods in the particular server you care about.
-module(am_extend).
-export([add/1]).
add(AppmodAdder) ->
{ok, GlobalConf, ServerConfs} = yaws_api:getconf(),
NewServerConfs = add_appmod(ServerConfs, AppmodAdder),
yaws_api:setconf(GlobalConf, NewServerConfs).
add_appmod(ServerConfs, AppmodAdder) ->
lists:foldl(fun(Val, Acc) ->
Acc ++ [AppmodAdder(A) || A <- Val]
end, [], ServerConfs).
An example of using this code is to pass the function below as the AppmodAdder argument for am_extend:add/1. For this example, we're looking for a server that has an appmod path "/sse" so we can add another appmod to that server for the path "/sse2". Any server conf we don't care about is just returned unchanged.
-include_lib("yaws/include/yaws.hrl").
add_sse2(#sconf{appmods=AM}=SC) ->
case lists:keyfind("/sse", 1, AM) of
false ->
SC;
_ ->
SC#sconf{appmods=[{"/sse2", my_sse_module}|AM]}
end.
Note that our add_sse2/1 function must be compiled with yaws.hrl included so it has the definition for the sconf record available.

Handling WebExceptions properly?

I have the following F# program that retrieves a webpage from the internet:
open System.Net
[<EntryPoint>]
let main argv =
let mutable pageData : byte[] = [| |]
let fullURI = "http://www.badaddress.xyz"
let wc = new WebClient()
try
pageData <- wc.DownloadData(fullURI)
()
with
| :? System.Net.WebException as err -> printfn "Web error: \n%s" err.Message
| exn -> printfn "Unknown exception:\n%s" exn.Message
0 // return an integer exit code
This works fine if the URI is valid and the machine has an internet connection and the web server responds properly etc. In an ideal functional programming world the results of a function would not depend on external variables not passed as arguments (side effects).
What I would like to know is what is the appropriate F# design pattern to deal with operations which might require the function to deal with recoverable external errors. For example if the website is down one might want to wait 5 minutes and try again. Should parameters like how many times to retry and delays between retries be passed explicitly or is it OK to embed these variables in the function?
In F#, when you want to handle recoverable errors you almost universally want to use the option or the Choice<_,_> type. In practice the only difference between them is that Choice allows you to return some information about the error while option does not. In other words, option is best when it doesn't matter how or why something failed (only that it did fail); Choice<_,_> is used when having information about how or why something failed is important. For example, you might want to write the error information to a log; or perhaps you want to handle an error situation differently based on why something failed -- a great use case for this is providing accurate error messages to help users diagnose a problem.
With that in mind, here's how I'd refactor your code to handle failures in a clean, functional style:
open System
open System.Net
/// Retrieves the content at the given URI.
let retrievePage (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
try
// If the data is retrieved successfully, return it.
client.DownloadData uri
|> Choice1Of2
with
| :? System.Net.WebException as webExn ->
// Return the URI and WebException so they can be used to diagnose the problem.
Choice2Of2 (uri, webExn)
| _ ->
// Reraise any other exceptions -- we don't want to handle them here.
reraise ()
/// Retrieves the content at the given URI.
/// If a WebException is raised when retrieving the content, the request
/// will be retried up to a specified number of times.
let rec retrievePageRetry (retryWaitTime : TimeSpan) remainingRetries (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
elif remainingRetries = 0u then
invalidArg "remainingRetries" "The number of retries must be greater than zero (0)."
// Try to retrieve the page.
match retrievePage client uri with
| Choice1Of2 _ as result ->
// Successfully retrieved the page. Return the result.
result
| Choice2Of2 _ as error ->
// Decrement the number of retries.
let retries = remainingRetries - 1u
// If there are no retries left, return the error along with the URI
// for diagnostic purposes; otherwise, wait a bit and try again.
if retries = 0u then error
else
// NOTE : If this is modified to use 'async', you MUST
// change this to use 'Async.Sleep' here instead!
System.Threading.Thread.Sleep retryWaitTime
// Try retrieving the page again.
retrievePageRetry retryWaitTime retries client uri
[<EntryPoint>]
let main argv =
/// WebClient used for retrieving content.
use wc = new WebClient ()
/// The amount of time to wait before re-attempting to fetch a page.
let retryWaitTime = TimeSpan.FromSeconds 2.0
/// The maximum number of times we'll try to fetch each page.
let maxPageRetries = 3u
/// The URI to fetch.
let fullURI = Uri ("http://www.badaddress.xyz", UriKind.Absolute)
// Fetch the page data.
match retrievePageRetry retryWaitTime maxPageRetries wc fullURI with
| Choice1Of2 pageData ->
printfn "Retrieved %u bytes from: %O" (Array.length pageData) fullURI
0 // Success
| Choice2Of2 (uri, error) ->
printfn "Unable to retrieve the content from: %O" uri
printfn "HTTP Status: (%i) %O" (int error.Status) error.Status
printfn "Message: %s" error.Message
1 // Failure
Basically, I split your code out into two functions, plus the original main:
One function that attempts to retrieve the content from a specified URI.
One function containing the logic for retrying attempts; this 'wraps' the first function which performs the actual requests.
The original main function now only handles 'settings' (which you could easily pull from an app.config or web.config) and printing the final results. In other words, it's oblivious to the retrying logic -- you could modify the single line of code with the match statement and use the non-retrying request function instead if you wanted.
If you want to pull content from multiple URIs AND wait for a significant amount of time (e.g., 5 minutes) between retries, you should modify the retrying logic to use a priority queue or something instead of using Thread.Sleep or Async.Sleep.
Shameless plug: my ExtCore library contains some things to make your life significantly easier when building something like this, especially if you want to make it all asynchronous. Most importantly, it provides an asyncChoice workflow and collections functions designed to work with it.
As for your question about passing in parameters (like the retry timeout and number of retries) -- I don't think there's a hard-and-fast rule for deciding whether to pass them in or hard-code them within the function. In most cases, I prefer to pass them in, though if you have more than a few parameters to pass in, you're better off creating a record to hold them all and passing that instead. Another approach I've used is to make the parameters option values, where the defaults are pulled from a configuration file (though you'll want to pull them from the file once and assign them to some private field to avoid re-parsing the configuration file each time your function is called); this makes it easy to modify the default values you've used in your code, but also gives you the flexibility of overriding them when necessary.

Create suite of interdependent Lua files without affecting the global namespace

tl;dr: What design pattern allows you to split Lua code over multiple files that need to share some information without affecting the global table?
Background
It is considered bad form to create a library in Lua where requiring the library affects the global namespace:
--> somelib.lua <--
SomeLib = { ... }
--> usercode.lua <--
require 'somelib'
print(SomeLib) -- global key created == bad
Instead, it is considered a best practice to create a library that uses local variables and then returns them for the user to assign as they see fit:
--> somelib.lua <--
local SomeLib = { ... }
return SomeLib
--> usercode.lua <--
local theLib = require 'somelib' -- consumers name lib as they wish == good
The above pattern works fine when using a single file. However, this becomes considerably harder when you have multiple files that reference each other.
Concrete Example
How can you rewrite the following suite of files so that the assertions all pass? Ideally the rewrites will leave the same files on disk and responsibilities for each file. (Rewriting by merging all code into a single file is effective, but not helpful ;)
--> test_usage.lua <--
require 'master'
assert(MASTER.Simple)
assert(MASTER.simple)
assert(MASTER.Shared)
assert(MASTER.Shared.go1)
assert(MASTER.Shared.go2)
assert(MASTER.Simple.ref1()==MASTER.Multi1)
assert(pcall(MASTER.Simple.ref2))
assert(_G.MASTER == nil) -- Does not currently pass
--> master.lua <--
MASTER = {}
require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'shared3'
require 'reference'
--> simple.lua <--
MASTER.Simple = {}
function MASTER:simple() end
--> multi.lua <--
MASTER.Multi1 = {}
MASTER.Multi2 = {}
--> shared1.lua <--
MASTER.Shared = {}
--> shared2.lua <--
function MASTER.Shared:go1() end
--> shared3.lua <--
function MASTER.Shared:go2() end
--> reference.lua <--
function MASTER.Simple:ref1() return MASTER.Multi1 end
function MASTER.Simple:ref2() MASTER:simple() end
Failure: Setting the Environment
I thought to solve the problem by setting the environment to my master table with a self-reference. This does not work when calling functions like require however, as they change the environment back:
--> master.lua <--
foo = "original"
local MASTER = setmetatable({foo="captured"},{__index=_G})
MASTER.MASTER = MASTER
setfenv(1,MASTER)
require 'simple'
--> simple.lua <--
print(foo) --> "original"
MASTER.Simple = {} --> attempt to index global 'MASTER' (a nil value)
You are giving master.lua two responsibilities:
It defines the common module table
It imports all of the submodules
Instead you should create a separate module for (1) and import it in all of the submodules:
--> common.lua <--
return {}
--> master.lua <--
require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'shared3'
require 'reference'
return require'common' -- return the common table
--> simple.lua <--
local MASTER = require'common' -- import the common table
MASTER.Simple = {}
function MASTER:simple() end
etc.
Finally, change the first line of test_usage.lua to use a local variable:
--> test_usage.lua <--
local MASTER = require'master'
...
The tests should now pass.
I have a systematic way to solve that problem. I have refactored your module in a Git repository to show you how it works: https://github.com/catwell/dont-touch-global-namespace/commit/34b390fa34931464c1dc6f32a26dc4b27d5ebd69
The idea is that you should have the sub-parts return a function that takes the main module as an argument.
If you cheat by opening the source files in master.lua, append a header and a footer and use loadstring, you can even use them unmodified (only master.lua has to be modified, but it is more complex). Personally, I prefer to keep it explicit, which is what I have done here. I don't like magic :)
EDIT: it is very close to Andrew Stark's first solution, except I patch the MASTER table directly in the sub-modules. The advantage is that you can define several things at once, like in your simple.lua, multi.lua and reference.lua files.
We can solve the problem by changing the master file to modify the environment in which all required code is run:
--> master.lua <--
local m = {} -- The actual master table
local env = getfenv(0) -- The current environment
local sandbox = { MASTER=m } -- Environment for all requires
setmetatable(sandbox,{__index=env}) -- ...also exposes read access to real env
setfenv(0,sandbox) -- Use the sandbox as the environment
-- require all files as before
setfenv(0,env) -- Restore the original environment
return m
The sandbox is an empty table that inherits values from _G but that also has a reference to the MASTER table, simulating a global from the perspective of later code. Using this sandbox as the environment causes all later requires to evaluate their "global" code in this context.
We save the real environment for later restoration, so that we don't mess with any later code that might want to actually set a global variable.
The question concerns:
Not polluting the global space when making modules.
Making modules in such a way that they might be split into multiple files, for maintenance reasons, among others.
My solution to the above problem lies in tweaking the "return as table" idiom in Lua such that instead of returning a table, you return a function that returns a table, when state needs to be passed between sub-modules.
This works well for sub-modules that are entirely dependent upon some root-module. If they are loaded independently, then they require the user to know that they need to call the module before they can use it. This is unlike every other module that has a collection of methods, ready to go from local a = require('a').
At any rate, this works like so:
--callbacks.lua a -- sub-module
return function(self)
local callbacks = {}
callbacks.StartElement = function(parser, elementName, attributes)
local res = {}
local stack = self.stack
---awesome stuff for about 150 lines...
return callbacks
end
To use it, you can...
local make_callbacks = require'callbacks'
self.callbacks = make_callbacks(self)
Or, better yet, simply call the return value of require when assigning the callback table to the parent module, like so:
self.callbacks = require'trms.xml.callbacks'(self)
Most often, I try not to do this. If I'm passing state or self between submodules, I find that I'm often doing it wrong. My internal policy is that if I'm doing something that is highly-related to another file, I might be okay. More likely, I'm putting something in the wrong spot and there is a way to do it without passing anything between modules.
The reason that I don't like this is that which I pass by table has methods and properties unseen in the file that I am working within. I'm not free to refactor the internal implementation of one of my files, without horking the others. So, I humbly suggest that this idiom is a yellow flag, but probably not a red one. :)
While this solves the problem of state-sharing without globals, it doesn't really protect the user from the accidental omission of local. If I may speak to that implied question...
The first thing that I do is remove access to the global environment from my module. Remembering that it's only available as long as I don't
reset _ENV, reseting it is the first thing that I do. This is done by packing only what is needed into a new _ENV table.
_ENV = {print = print,
pairs = pairs, --etc
}
However, constantly re-typing all of the things that I need from lua into each file is a giant, error-prone pain. To avoid this, I make one file in my module's base directory and use it as the home for all of my modules' and sub-modules' common environments. I call it _ENV.lua.
Note: I cannot use "init.lua" or any other root-module for this purpose, because I need to be able to load it from the sub-modules, which are being loaded by
the root-module, which loads the sub-modules, which are...
My abbreviated _ENV.lua file looks something like the following:
--_ENV.lua
_ENV = {
type = type, pairs = pairs, ipairs = ipairs, next = next, print =
print, require = require, io = io, table = table, string = string,
lxp = require"lxp", lfs = require"lfs",
socket = require("socket"), lpeg = require'lpeg', --etc..
}
return _ENV
With this file, I now have a common base from which to work.
All of my other modules load this first, using the following command:
_ENV = require'root_mod._ENV' --where root_mod is the base of my module.
This facility was critical for me, for two reasons. First, it keeps me
out of global space. If I see that I am missing something from the global environment _G (took me a surprisingly long time before I saw that I didn't have
tostring!), I can go back into my _ENV.lua file and add it. As
a required file, this only gets loaded one time, so having it applied
to all of my submodules is 0 calories.
Second, I find that it gives me everything that I really needed for using
the "return module as table" protocol, with only a few exceptions where "return a function that returns a table" is needed.
TL;DR: Don't return the module, set package.loaded[...] = your_module as early as possible (can still be empty), then just require the module in submodules and it will be properly shared.
The clean way to do this is to explicitly register the module and not rely on require to implicitly register it at the end. The documentation says:
require (modname)
Loads the given module. The function starts by looking into the
package.loaded table to determine whether modname is already loaded.
If it is, then require returns the value stored at
package.loaded[modname]. [This gets you the caching behavior that
every file is run only once.] Otherwise, it tries to find a loader for
the module. [And one of the searchers is looking for Lua files to run,
which gets you the usual file loading behavior.]
[…]
Once a loader is found, require calls the loader with two arguments:
modname and an extra value dependent on how it got the loader. (If the
loader came from a file, this extra value is the file name.) If the loader
returns any non-nil value [e.g. your file returns the module table],
require assigns the returned value to package.loaded[modname]. If the
loader does not return a non-nil value and has not assigned any value to
package.loaded[modname], then require assigns true to this entry.
In any case, require returns the final value of
package.loaded[modname].
(emphasis, [comments] added by me.)
With the return mymodule idiom, the caching behavior fails if you have a loop in your dependencies – the cache is updated too late. (As a result, files may be loaded several times (you may even get endless loops!) and sharing will fail.) But explicitly saying
local _M = { } -- your module, however you define / name it
package.loaded[...] = _M -- recall: require calls loader( modname, something )
-- so `...` is `modname, something` which is shortened
-- to just `modname` because only one value is used
immediately updates the cache, so that other modules can already require your module before its main chunk returned. (Of course, at that time they can only actually use what's already been defined. But that's not usually a problem.)
The package.loaded[...] = mymodule approach works in 5.1–5.3 (incl. LuaJIT).
For your example, you would adjust the start of master.lua to
1c1,2
< MASTER = {}
---
> local MASTER = {}
> package.loaded[...] = MASTER
and for all other files
0a1
> local MASTER = require "master"
and you're done.

lua how require works

I'm using a graphics library that lets you program in Lua. I have a need for the A* pathfinding library so I found one online. It's just 1 lua file that does the pathfinding and 1 example file. In the example file it uses the object like:
-- Loading the library
local Astar = require 'Astar'
Astar(map,1) -- Inits the library, sets the OBST_VALUE to 1
I run the script and everything works. So now I add the Astar.lua file to the path location where my graphics engine is running and do the same thing and I get the error on the Astar(map, 1) line:
"attempt to call local 'AStar' (a number value)
Any ideas why I would be getting that error when I'm doing the same thing as the example that comes with this AStar lib?
Here is a little of the AStar file
-- The Astar class
local Astar = {}
setmetatable(Astar, {__call = function(self,...) return self:init(...) end})
Astar.__index = Astar
-- Loads the map, sets the unwalkable value, inits pathfinding
function Astar:init(map,obstvalue)
self.map = map
self.OBST_VALUE = obstvalue or 1
self.cList = {}
self.oList = {}
self.initialNode = false
self.finalNode = false
self.currentNode = false
self.path = {}
self.mapSizeX = #self.map[1]
self.mapSizeY = #self.map
end
So note that when I run this from my graphics engine it's returning 1, but when run from the example that it came with it's returning a table, which is what it should be returning. So not sure why it would only be returning 1.
How is Astar getting added to the package.loaded table for the example script, as opposed to your code?
QUICK LUA SYNTACTIC SUGAR REVIEW:
func 'string' is equivalent to func('string')
tabl.ident is equivalent to tabl['ident']
When you run a script using require('Astar'), this is what it does:
checks if package.loaded['Astar'] is a non-nil value.
If it is, it returns this value. Otherwise it continues down this list.
Runs through filenames of the patterns listed in package.path (and package.cpath), with '?' replaced with 'Astar', until it finds the first file matching the pattern.
Sets package.loaded['Astar'] to true.
Runs the module script (found via path search above- for the sake of this example we'll assume it's not a C module) with 'Astar' as an argument (accessible as ... in the module script).
If the script returns a value, this value is placed into package.loaded['Astar'].
The contents of package.loaded['Astar'] are returned.
Note that the script can load the package into package.loaded['Astar'] as part of its execution and return nothing.
As somebody noted in the comments above, your problem may come from loading the module using 'AStar' instead of 'Astar'. It's possible that Lua is loading this script using this string (since, on the case-insensitive Windows, a search for a file named "AStar.lua" will open a file called "Astar.lua"), but the script isn't operating with that (by using a hard-coded "Astar" instead of the "AStar" Lua is loading the script under).
You need to add return Astar at the end of Astar.lua.

Resources