Is Solr's LTR plugin practical for large models? - machine-learning

When using the SOLR LTR Plugin's MultipleAdditiveTreesModel and NeuralNetworkModel classes, your model is defined in JSON config that is loaded into a model store.
It is my understanding that complex tree-models and neural-network (NN) models can get very large if serialised into JSON. I think certain NN models could be comfortably >500MB when serialised to JSON.
The documentation for the plugin acknowledges that loading "large" models can fail to load because of a Zookeeper limit (ref: https://lucene.apache.org/solr/guide/8_4/learning-to-rank.html#using-large-models). When clicking through to a possible fix, it says that the limit is by default 1MB (!?).
Should the Solr LTR plugin only be used for simple use-cases where you can keep your model simple enough to be <10MB when serialised?

According to Solr LTR documentation you can use DefaultWrapperModel and place your model file on disk:
Add path to models in the lib section of solrconfig:
<lib dir="/path/to" regex="models" />
Then, configure your model:
{
"store" : "largeModelsFeatureStore",
"name" : "myWrapperModel",
"class" : "org.apache.solr.ltr.model.DefaultWrapperModel",
"params" : {
"resource" : "myModel.json"
}
}
And access your model in reranking query as model=myWrapperModel.
https://lucene.apache.org/solr/guide/8_4/learning-to-rank.html#using-large-models

Related

IBM Integration Bus and xsd:anyType

I'm working with IIB v9 mxsd message definitions. I'd like to define one of the XML elements to be of type xsd:anyType. However, in the list of types I can choose from, only anySimpleType and anyUri are possible (besides all other types like string, integer, etc.).
How can I get around this limitation?
The XMLNSC parser supports the entire XML Schema specification, including xs:any and xs:anyType. In IIBv9 you should create a Library and import your xsds into it. Link your Application to the Library and the XMLNSC parser will find and use the model. You do not need to specify the name of the Library in the node properties; the XSD model will be automatically available to the entire application.
You do not need to use a message set at all in IIBv9 and later versions.
The mxsd file format is used only by the MRM (not DFDL) parser.
You shouldn't use an MXSD to model your XML data, use a normal XSD.
MXSD is for modelling data for the DFDL parser, but you should use the XMLNSC parser for XML messages and define them in XSDs, in which you can use anyType.
As far as I know DFDL doesn't support anyType.

Two Step View Pattern

Martin Fowler's PoEAA catalog is like a repository for Ruby gems and Rails modules, for example the ActiveRecord ORM from Rails is based on Fowler's ActiveRecord, and the DataMaper gem is based on the Data Mapper pattern. Are there any useful implementations of Martin Fowler's two-step view pattern in Ruby, e.g. in combination with a template engine?
The pattern turns domain data into HTML in two steps. It is particularly interesting if you want to compose your views into decoupled, reusable view components.
One possible solution to implement the two step view seems to be an XSLT transformation, for example with XML and Nokogiri. This means to create an intermediate xml representation of the page:
XML == (XSLT) ==> XML
XML == (XSLT) ==> HTML
A second possible solution is to use a JS template engine like vue.js, KnockoutJS, Ractive.js or React. Rails does the first step and creates an intermediate view, the JS template engine the second:
Rails Template == (Rails) == > View-Template
View-Template + JSON-Data == (vue.js/KnockoutJS/Ractive.js/React) ==> HTML
This is a new pattern to me, but I can two ways to conceptualize it. The core of the pattern seems to be building an intermediate representation first, then run it through a formatting step. In each case, the outcome is a view that looks identical regardless of which class of ActiveRecord model is being displayed.
Option 1: Intermediate Ruby Object
Using a Presenter library (Draper, ActiveDecorator, roll-your-own), you can normalize multiple ActiveRecord classes to a single public API. Then you write a single view template that can render objects with that API.
In this case you create a single view template plus one Presenter object for each ActiveRecord class you need to render. If you need to add new data to the page, you have to touch the template and all of the Presenter classes.
Option 2: HTML + CSS
It's odd, but I think HTML is a valid format to represent data, and could be considered as an intermediate, unformatted representation.
In this case you create a view template (probably a partial, possibly polymorphic) for each ActiveRecord class that produces (nearly) identical HTML. Then you use a CSS component framework to "format" the HTML into identical renderings. The HTML doesn't need to be strictly identical, as long as it all conforms to what your component framework is expecting. Adding data here means changing each view template (the CSS usually won't need modification).
I think both of these approaches are valid. The second feels more "rails-y" to me, but I think it's a departure from the spirit of the pattern, even if it technically conforms (which might be debatable).

How can I dynamically include modules in nested directories?

I want to dynamically load code by traversing a directory structure and dynamically load whatever modules I find there.
The purpose for doing so is to run a series of validations. If a top-level validation fails, any child validations will not be run.
My thinking was that a controller object could scan the directories, build up a hierarchy of modules and then make the decisions on whether or not to traverse a particular part of the tree based on the success/failure of higher-level validations.
For example, I might have a series of validations I want to run against a regex, however, none of the validations should be run if the regex doesn't exist or is empty. In this case, the top level directory would contain just the exists validation, and a child directory would contain all the other validations to be run if the regex exists.
Being able to define these validations in separate files and create the needed hierarchy would be extremely useful for ease of adding additional validations later, rather than having to crack open an existing class and add methods.
Is there a way an application can dynamically scan a directory, save the filenames in a collection and then use the elements of that collection in a require? I don't think so. What about a load?
Is there any way to achieve such a design? Or am I thinking about it all wrong and should think of some other methodology instead?
Your request is very doable, but no language will do it for you automatically. You have to write the code to dive into the directories, determine the existence of the tests and then decide whether you should drill down further.
Ruby will help you though. There is the Find module, which is included in the standard library. This is from its docs:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory,
ignoring anything in a "dot" directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end
From that code you would look for the signatures of the files and embedded folders, to decide if you should drill down further. For each file found that is one you want, use require to load it.
You can find other examples out on the "internets" showing how people use Find. Also the Dir module has similar functionality using glob, only you have to tell it where to descend, and then can iterate over the returned results.

Symfony: multiple i18n sources

For my project, I need to store translations in the database, for which I've implemented doctrine data source. However, I would like to leave standard translations (sf_admin and messages) in xml and keep them under source control. Is it possible to have 2 i18n instances that use different data sources? Or maybe one instance that can load data from different sources according to dictionary name?
I don't think there is a solution that doesn't require overriding sfI18n. An sfMessageSource_Aggregate exists but it seems nearly impossible to configure factories.yml to initialize it correctly.
You probably need to implement your own sfI18n::createMessageSource, that constructs the Aggregate passing the different sources in the constructor.

Grails fixtures

I was trying to use fixtures plugin for initial (seed)data loading.. the documentation seems very short.. can anyone give some details about
1. where to define all the data, and in which order
2. how to give complex data type (joda time, currency etc)
3. how to load the fixure data only once for the initial data
thanks,
Grails Fixtures plugin documentation is now quite ok, check it here
After installing the plugin you will have a new folder in your Grails App directory called "fixtures". There you may store *.groovy files with the given test data written in the documented DSL.
Example init.groovy file:
// Import needed classes
// Defining some initial testdata
fixture {
cat0(Category, name: "My category 1")
cat1(Category, name: "My category 2")
}
The fixture definitions have to be in the fixture closure.
Edit
Even though the original link to the documentation is not online anymore, the content can still be found in its repository.

Resources