Weka ThresholdSelector and CostSensitiveClassifier on stream learning

Weka ThresholdSelector and CostSensitiveClassifier on stream learning - stream

Are Weka's ThresholdSelector and/or CostSensitiveClassifier compatible with stream learning (updatable classifiers) ? My goal is to use them with weka.classifiers.meta.MOA to focus learning on a specific class and minimize FN on some imbalanced data.
Thanks a lot!

Following my post on Weka Pentaho forum the answer is that nor ThresholdSelector nor CostSensitiveClassifier supports updateable classifiers. So streaming learning with those meta classifiers is not currently possible.
Therefore, I proposed a draft code to create an updateable version of those classifiers. Any comment/suggestion would be highly welcome.
weka.classifiers.meta.CostSensitiveClassifier code update to create an updateable version (this one "seems" the most straightforward)
/*
weka.classifiers.meta.CostSensitiveClassifier: draft code update and questions to make it compatible with updateable classifiers
*/
import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;
public void updateClassifier(Instance instance) throws Exception {
if (!instance.classIsMissing()) {
if (m_Classifier == null)
throw new Exception("No base classifier has been set!");
// not sure on how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere (ie. external call to buildClassifier)
if (m_CostMatrix is null || (m_CostMatrix.size() == 1 && !classifierAlreadyUpdated)) {
buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
classifierAlreadyUpdated = True;
}
else {
double factor = 1.0;
int classValIndex = (int) instance.classValue();
Object element = (classValIndex == 0) ? m_CostMatrix.getCell(classValIndex, 1) : m_CostMatrix.getCell(classValIndex, 0);
if (element instanceof Double) {
factor = ((Double) element).doubleValue();
} else {
factor = ((AttributeExpression) element).evaluateExpression(instance);
}
double weightOfInstance = instance.weight() * factor;
if (!m_MinimizeExpectedCost) {
((UpdateableClassifier)m_Classifier).updateClassifier(instance.setWeight(weightOfInstance));
} else {
((UpdateableClassifier)m_Classifier).updateClassifier(instance);
}
}
}
}
weka.classifiers.meta.ThresholdSelector code update to create an updateable version (waiting for your comments/suggestions):
/*
weka.classifiers.meta.ThresholdSelector draft code update and questions to make it compatible with updateable classifiers
I've got the big picture but I would need some help on findThreshold and the evaluation mode
findThreshold:
double low, high, maxValue and Instance maxInst => should become protected class properties in order
to keep them updated across build&all updates and could be resetted when calling buildClassifier
Evaluation mode and getPredictions: should I create a new Evaluation mode ?
EVAL_TRAINING_SET does not seem a good option as it would skip the updateClassifier
I could then modify toString and add the code below to getPredictions ?
case EVAL_STREAM:
return eu.getTrainTestPredictions(m_Classifier, instances, instances);
For updateClassifier, please find below a draft code
*/
import weka.classifiers.UpdateableClassifier;
....
implements ... UpdateableClassifier;
...
protected boolean classifierAlreadyUpdated = False;
public void updateClassifier(Instance instance) throws Exception {
if (!instance.classIsMissing()) {
if (m_Classifier == null)
throw new Exception("No base classifier has been set!");
// Don't know how to properly check if m_CostMatrix has already been fully intialized here or from elsewhere
if (!classifierAlreadyUpdated)) {
buildClassifier(new Instances[] {instance}); // re-use intialization process from buildClassifier
classifierAlreadyUpdated = True;
}
else {
// If data contains only one instance of positive data
// optimize on training data
if (stats.distinctCount != 2) {
System.err.println("Couldn't find examples of both classes. No adjustment.");
m_Classifier.updateClassifier(instance);
}
else {
// m_DesignatedClass: already initialized via buildClassifier (called if needed during first update)
if (m_manualThreshold) {
m_Classifier.updateClassifier(instance);
return;
}
if (stats.nominalCounts[m_DesignatedClass] == 1) {
System.err.println("Only 1 positive found: optimizing on training data");
findThreshold(getPredictions(new Instances[] {instance}, EVAL_TRAINING_SET, 0));
} else {
int numFolds = Math.min(m_NumXValFolds, stats.nominalCounts[m_DesignatedClass]);
findThreshold(getPredictions(new Instances[] {instance}, m_EvalMode, numFolds));
if (m_EvalMode != EVAL_TRAINING_SET) {
m_Classifier.updateClassifier(instance);
}
}
}
}
}
Thanks

Related

How would one implement a doAfterNext operator in Project Reactor?

RxJava2 has a doAfterNext operator that emits items downstream, and then invokes the consumer. It doesn't seem like Project Reactor has such an operator so I'd like to get some pointers on the best way to create my own to achieve the same thing.
The use case is freeing memory after the subscriber has received the item

Not sure if leavering doOnEach is a valid solution:
public class ByteBufferSafeReleaseConsumer implements Consumer<Signal<ByteBuffer<?>>> {
private final List<ByteBuffer<?>> elements = new ArrayList<>();
#Override
public void accept(Signal<ByteBuffer<?>> signal) {
if (signal.isOnNext()) {
ByteBuffer<?> next = signal.get();
if (next != null) {
elements.add(next);
}
}
if (signal.isOnComplete() || signal.isOnError()) {
for (ByteBuffer<?> buffer : elements) {
ByteBufferUtils.safeRelease(buffer);
}
}
}
}
ByteBufferSafeReleaseConsumer consumer = new ByteBufferSafeReleaseConsumer()
Flux.from(byteBufferPublisher).doOnEach(consumer)

apache ignite datastreamer how to set data into ignitefuture?

I am creating a batch data streamer in apache ignite, and need to control what happening after data receive.
My batch has a structure:
public class Batch implements Binarylizable, Serializable {
private String eventKey;
private byte[] bytes;
etc..
Then i trying to stream my data:
try (IgniteDataStreamer<Integer, Batch> streamer = serviceGrid.getIgnite().dataStreamer(cacheName);
StreamBatcher batcher = StreamBatcherFactory.create(event) ){
streamer.receiver(StreamTransformer.from(new BatchDataProcessor(event)));
streamer.autoFlushFrequency(1000);
streamer.allowOverwrite(true);
statusService.updateStatus(event.getKey(), StatusType.EXECUTING);
int counter = 0;
Batch batch = null;
IgniteFuture<?> future = null;
while ((batch = batcher.batch()) != null) {
future = streamer.addData(counter++, batch);
}
Object getted = future.get();
Just for test use lets get only the last future, and try to analyze this object. In the code above I'm using BatchDataProcessor, that look like this:
public class BatchDataProcessor implements CacheEntryProcessor<Integer, Batch, Object> {
private final Event event;
private final String eventKey;
public BatchDataProcessor(Event event) {
this.event = event;
this.eventKey = event.getKey();
}
#Override
public Object process(MutableEntry<Integer, Batch> mutableEntry, Object... objects) throws EntryProcessorException {
Node node = NodeIgniter.node(Ignition.localIgnite().cluster().localNode().id());
ServiceGridContainer container = (ServiceGridContainer) node.getEnvironmentContainer().getContainerObject(ServiceGridContainer.class);
ProcessMarshaller marshaller = (ProcessMarshaller) container.getService(ProcessMarshaller.class);
LocalProcess localProcess = marshaller.intoProccessing(event.getLambdaExecutionKey());
try {
localProcess.addBatch(mutableEntry);
} catch (IOException e) {
e.printStackTrace();
} finally {
return new String("111");
}
}
}
So after localProcess.addBatch(mutableEntry) I want to send back an information about the status of this particular batch, so I think that I should do this in IgniteFuture object, but I don't find any information how to control the future object that's received in addData function.
Can anybody help with understanding, where can I control future that receives in addData function or some other way to realize a callback to streamed batch?

When you do StreamTransformer.from(), you forfeit the result of your BatchDataProcessor, because
for (Map.Entry<K, V> entry : entries)
cache.invoke(entry.getKey(), this, entry.getValue());
// ^ result of cache.invoke() is discarded here
DataStreamer is for one-directional streaming of data. It is not supposed to return values as far as I know.
If you depend on the result of cache.invoke(), I recommend calling it directly instead of relying on DataStreamer.
BTW, be careful with fut.get(). You should do dataStreamer.flush() first, or DataStreamer's futures will wait indefinitely.

Groovy/Grails promises/futures. There is no .resolve(1,2,3) method. Strange?

I am developing in a Grails application. What I want to do is to lock the request/response, create a promise, and let someone else resolve it, that is somewhere else in the code, and then flush the response.
What I find really strange is that the Promise promise = task {} interface has no method that resembles resolve or similar.
I need to lock the response until someone resolves the promise, which is a global/static property set in development mode.
Promise interface:
http://grails.org/doc/latest/api/grails/async/Promise.html
I have looked at the GPars doc and can't find anything there that resembles a resolve method.
How can I create a promise, that locks the response or request, and then flushes the response when someone resolves it?

You can call get() on the promise which will block until whatever the task is doing completes, but I imagine what that is not what you want. What you want seems to be equivalent to a GPars DataflowVariable:
http://gpars.org/1.0.0/javadoc/groovyx/gpars/dataflow/DataflowVariable.html
Which allows using the left shift operator to resolve the value from another thread. Currently there is no way to use the left shift operator via Grails directly, but since Grails' promise API is just a layer over GPars this can probably be accomplished by using the GPars API directly with something like:
import org.grails.async.factory.gpars.*
import groovyx.gpars.dataflow.*
import static grails.async.Promise.*
def myAction() {
def dataflowVar = new DataflowVariable()
task {
// do some calculation and resolve data flow variable
def expensiveData = ...
dataflowVar << expensiveData
}
return new GParsPromise(dataflowVar)
}

It took me quite some time to get around this and have a working answer.
I must say that it appears as if Grails is quite a long way of making this work properly.
task { }
will always execute immediatly, so the call is not put on hold until dispatch() or whatever is invoked which is a problem.
Try this to see:
public def test() {
def dataflowVar = new groovyx.gpars.dataflow.DataflowVariable()
task {
// do some calculation and resolve data flow variable
println '1111111111111111111111111111111111111111111111111111'
//dataflowVar << expensiveData
}
return new org.grails.async.factory.gpars.GparsPromise(dataflowVar);
}
If you are wondering what this is for, it is to make the lesscss refresh automatically in grails, which is a problem when you are using import statements in less. When the file is touched, the lesscss compiler will trigger a recompilation, and only when it is done should it respond to the client.
On the client side I have some javascript that keeps replacing the last using the refresh action here:
In my controller:
/**
* Refreshes link resources. refresh?uri=/resource/in/web-app/such/as/empty.less
*/
public def refresh() {
return LessRefresh.stackRequest(request, params.uri);
}
A class written for this:
import grails.util.Environment
import grails.util.Holders
import javax.servlet.AsyncContext
import javax.servlet.AsyncEvent
import javax.servlet.AsyncListener
import javax.servlet.http.HttpServletRequest
/**
* #Author SecretService
*/
class LessRefresh {
static final Map<String, LessRefresh> FILES = new LinkedHashMap<String, LessRefresh>();
String file;
Boolean touched
List<AsyncContext> asyncContexts = new ArrayList<AsyncContext>();
String text;
public LessRefresh(String file) {
this.file = file;
}
/** Each request will be put on hold in a stack until dispatchAll below is called when the recompilation of the less file finished **/
public static AsyncContext stackRequest(HttpServletRequest request, String file) {
if ( !LessRefresh.FILES[file] ) {
LessRefresh.FILES[file] = new LessRefresh(file);
}
return LessRefresh.FILES[file].handleRequest(request);
}
public AsyncContext handleRequest(HttpServletRequest request) {
if ( Environment.current == Environment.DEVELOPMENT ) {
// We only touch it once since we are still waiting for the less compiler to finish from previous edits and recompilation
if ( !touched ) {
touched = true
touchFile(file);
}
AsyncContext asyncContext = request.startAsync();
asyncContext.setTimeout(10000)
asyncContexts.add (asyncContext);
asyncContext.addListener(new AsyncListener() {
#Override
void onComplete(AsyncEvent event) throws IOException {
event.getSuppliedResponse().writer << text;
}
#Override
void onTimeout(AsyncEvent event) throws IOException {
}
#Override
void onError(AsyncEvent event) throws IOException {
}
#Override
void onStartAsync(AsyncEvent event) throws IOException {
}
});
return asyncContext;
}
return null;
}
/** When recompilation is done, dispatchAll is called from LesscssResourceMapper.groovy **/
public void dispatchAll(String text) {
this.text = text;
if ( asyncContexts ) {
// Process all
while ( asyncContexts.size() ) {
AsyncContext asyncContext = asyncContexts.remove(0);
asyncContext.dispatch();
}
}
touched = false;
}
/** A touch of the lessfile will trigger a recompilation **/
int count = 0;
void touchFile(String uri) {
if ( Environment.current == Environment.DEVELOPMENT ) {
File file = getWebappFile(uri);
if (file && file.exists() ) {
++count;
if ( count < 5000 ) {
file << ' ';
}
else {
count = 0
file.write( file.getText().trim() )
}
}
}
}
static File getWebappFile(String uri) {
new File( Holders.getServletContext().getRealPath( uri ) )
}
}
In LesscssResourceMapper.groovy of the lesscsss-recources plugin:
...
try {
lessCompiler.compile input, target
// Update mapping entry
// We need to reference the new css file from now on
resource.processedFile = target
// Not sure if i really need these
resource.sourceUrlExtension = 'css'
resource.contentType = 'text/css'
resource.tagAttributes?.rel = 'stylesheet'
resource.updateActualUrlFromProcessedFile()
// ==========================================
// Call made here!
// ==========================================
LessRefresh.FILES[resource.sourceUrl.toString()]?.dispatchAll( target.getText() );
} catch (LessException e) {
log.error("error compiling less file: ${originalFile}", e)
}
...
In the index.gsp file:
<g:set var="uri" value="${"${App.files.root}App/styles/empty.less"}"/>
<link media="screen, projection" rel="stylesheet" type="text/css" href="${r.resource(uri:uri)}" refresh="${g.createLink(controller:'home', action:'refresh', params:[uri:uri])}" resource="true">
JavaScript method refreshResources to replace the previous link href=...
/**
* Should only be used in development mode
*/
function refreshResources(o) {
o || (o = {});
var timeoutBegin = o.timeoutBegin || 1000;
var intervalRefresh = o.intervalRefresh || 1000;
var timeoutBlinkAvoid = o.timeoutBlinkAvoid || 400 ;
var maxErrors = o.maxErrors || 200 ;
var xpath = 'link[resource][type="text/css"]';
// Find all link[resource]
$(xpath).each(function(i, element) {
refresh( $(element) );
});
function refresh(element) {
var parent = element.parent();
var next = element.next();
var outer = element.clone().attr('href', '').wrap('<p>').parent().html();
var uri = element.attr('refresh');
var errorCount = 0;
function replaceLink() {
var link = $(outer);
link.load(function () {
// The link has been successfully added! Now remove the other ones, then do again
errorCount = 0;
// setTimeout needed to avoid blinking, we allow duplicates for a few milliseconds
setTimeout(function() {
var links = parent.find(xpath + '[refresh="'+uri+'"]');
var i = 0;
// Remove all but this one
while ( i < links.length - 1 ) {
links[i++].remove();
}
replaceLinkTimeout();
}, timeoutBlinkAvoid );
});
link.error(function(event, handler) {
console.log('Error refreshing: ' + outer );
++errorCount;
if ( errorCount < maxErrors ) {
// Load error, it happens. Remove this & redo!
link.remove();
replaceLink();
}
else {
console.log('Refresh: Aborting!')
}
});
link.attr('href', urlRandom(uri)).get(0);
link.insertBefore(next); // Insert just after
}
function urlRandom(uri) {
return uri + "&rand=" + Math.random();
}
function replaceLinkTimeout() {
setTimeout(function() {
replaceLink();
}, intervalRefresh ) ;
}
// Waith 1s before triggering the interval
setTimeout(function() {
replaceLinkTimeout();
}, timeoutBegin);
}
};
Comments
I am unsure why Javascript style promises have not been added to the Grails stack.
You can not render or stuff like that in the onComplete. render, redirect and what not are not available.
Something tells me that Grails and Promises/Futures are not there yet. The design of the GPars libraries seems not take into account of the core features which is to resolve later. At least it is not simple to do so.
It would be great if the dispatch() method actually could be invoked with some paramaters to pass from the resolving context. I am able to go around this using static properties.
I might continue to write my own solution and possibly contribute with a more fitting solutions around the AsyncContext class, but for now, this is enough for me.
I just wanted to refresh my less resources automatically.
Phew...
EDIT:
I made it to support several number of files. It is complete now!

Zf2 MasterSlaveFeature different Slaves

is it possible to use different Slave-Connections by using the MasterSlaveFeature in a TableGateway?
I successfully managed to implement the MasterSlaveFeature with one Slave.

I think it is not possible. If you look at the MasterSlaveFeature code you see there is only one $slaveAdapter possible.
If you would like to have a "MultipleMasterSlaveFeature", I guess you have to write it by your own.
You should have a good strategy in mind to select one of your slave adapters, e.g. random select, time dependend or another strategy...
Such class could look like the following code...
class MultipleMasterSlaveFeature extends AbstractFeature {
// Array with AdapterInterface objects
protected $slaveAdpters = null;
//
protected $masterSql = null;
//
protected $slaveSql = null;
/**
*
* #param array $slaveAdapters
* #param Sql $slaveSql
*/
public __construct(array $slaveAdapters, $slaveSql = null)
{
$this->slaveAdapters = $slaveAdapters;
if ($slaveSql) {
$this->slaveSql = $slaveSql;
}
}
// ...
/**
* after initialization, retrieve the original adapter as "master"
*/
public function postInitialize()
{
// Select one of the specified slave adapters
// .. depending on timestamp
$selectedSlaveAdapter = (count($this->slaveAdapters) == 0) ? null : $this->slaveAdapters[time() % count($this->slaveAdapters)];
// ..from MasterSlaveFeature class
$this->masterSql = $this->tableGateway->sql;
if ($this->slaveSql == null) {
$this->slaveSql = new Sql(
$selectedSlaveAdapter,
$this->tableGateway->sql->getTable(),
$this->tableGateway->sql->getSqlPlatform()
);
}
}
// .. preSelect() and postSelect() from MasterSlaveFeature class
}

Memory issue when using CGImage.ScreenImage in a loop using Mono Touch

I'm trying to create an app to read QR codes using Monotouch and C# port of Zxing but I'm hitting memory issues. While the app processes captured screen frames the app receives memory warnings and is then shut down. I have removed the call to Zxing to track down where the memory issue stems from and can reproduce the issue with just capturing the screen image in a loop.
Here is the code:
using System;
using System.Drawing;
using System.Collections.Generic;
using System.Threading;
using MonoTouch.UIKit;
using MonoTouch.Foundation;
using MonoTouch.CoreGraphics;
using com.google.zxing;
using com.google.zxing.common;
using System.Collections;
using MonoTouch.AudioToolbox;
using iOS_Client.Utilities;
namespace iOS_Client.Controllers
{
public class CameraOverLayView : UIView
{
private Thread _thread;
private CameraViewController _parentViewController;
private Hashtable hints;
private static com.google.zxing.MultiFormatReader _multiFormatReader = null;
private static RectangleF picFrame = new RectangleF(0, 146, 320, 157);
private static UIImage _theScreenImage = null;
public CameraOverLayView(CameraViewController parentController) : base()
{
Initialize();
_parentViewController = parentController;
}
private void Initialize()
{
}
private bool Worker()
{
Result resultb = null;
if(DeviceHardware.Version == DeviceHardware.HardwareVersion.iPhone4
|| DeviceHardware.Version == DeviceHardware.HardwareVersion.iPhone4S)
{
picFrame = new RectangleF(0, 146*2, 320*2, 157*2);
}
if(hints==null)
{
var list = new ArrayList();
list.Add (com.google.zxing.BarcodeFormat.QR_CODE);
hints = new Hashtable();
hints.Add(com.google.zxing.DecodeHintType.POSSIBLE_FORMATS, list);
hints.Add (com.google.zxing.DecodeHintType.TRY_HARDER, true);
}
if(_multiFormatReader == null)
{
_multiFormatReader = new com.google.zxing.MultiFormatReader();
}
using (var screenImage = CGImage.ScreenImage.WithImageInRect(picFrame))
{
using (_theScreenImage = UIImage.FromImage(screenImage))
{
Bitmap srcbitmap = new System.Drawing.Bitmap(_theScreenImage);
LuminanceSource source = null;
BinaryBitmap bitmap = null;
try {
source = new RGBLuminanceSource(srcbitmap, screenImage.Width, screenImage.Height);
bitmap = new BinaryBitmap(new HybridBinarizer(source));
try {
_multiFormatReader.Hints = hints;
resultb = null;
//_multiFormatReader.decodeWithState(bitmap);
if(resultb != null && resultb.Text!=null)
{
InvokeOnMainThread( () => _parentViewController.BarCodeScanned(resultb));
}
}
catch (ReaderException re)
{
//continue;
}
} catch (Exception ex) {
Console.WriteLine(ex.Message);
}
finally {
if(bitmap!=null)
bitmap = null;
if(source!=null)
source = null;
if(srcbitmap!=null)
{
srcbitmap.Dispose();
srcbitmap = null;
}
}
}
}
return resultb != null;
}
public void StartWorker()
{
if(_thread==null)
{
_thread = new Thread(()=> {
bool result = false;
while (result == false)
{
result = Worker();
Thread.Sleep (67);
}
});
}
_thread.Start();
}
public void StopWorker()
{
if(_thread!=null)
{
_thread.Abort();
_thread = null;
}
//Just in case
_multiFormatReader = null;
hints = null;
}
protected override void Dispose(bool disposing)
{
StopWorker();
base.Dispose(disposing);
}
}
}
Interestingly I took a look at http://blog.reinforce-lab.com/2010/02/monotouchvideocapturinghowto.html to try and see how others were capturing and processing video and this code suffers from the same as mine, quitting after about 40 seconds with memory warnings.
Hopefully the QR codes will be scanned in less than 40 seconds but I'm not sure if the memory ever gets released so the problem may crop up after many codes have been scanned. Either way it should be possible to capture a video feed continuously without memory issues right?

This is somewhat counter-intuitive, but the ScreenImage property will create a new CGImage instance every time you call it, so you must call Dispose on that object as well:
using (var img = CGImage.ScreenImage) {
using (var screenImage = img.WithImageInRect(picFrame))
{
}
}

I will just add the actual solution that worked for me which combined information from previous answers. The code inside the loop looks like:
using (var pool = new NSAutoreleasePool ())
{
using (var img = CGImage.ScreenImage)
{
using (var screenImage = img.WithImageInRect(picFrame))
{
using (_theScreenImage = UIImage.FromImage(screenImage))
{
}
}
}
}
GC.Collect();

The original System.Drawing.Bitmap from zxing.MonoTouch suffered from a lack of Dispose which made it never release the unmanaged memory it allocated.
The more recent one (from your link) does free the unmanaged memory when Dispose is called (it's better). However it creates a bitmap context (in it's constructor) and does not dispose it manually (e.g. with a using). So it relies on the garbage collector (GC) to do it later...
In many cases this is not a big issue since the GC will, eventually, free this context instance and will reclaim the associated memory. However if you're doing this in a loop it's possible you'll run out of (unmanaged) memory before the GC kicks in. That will get you memory warnings and iOS can decide to kill your application (or it could crash by itself).
but I'm not sure if the memory ever gets released
Yes, it should be - but maybe not as fast as you need the memory back. Implementing (and using) IDisposable correctly will solve this.
Either way it should be possible to capture a video feed continuously without memory issues right?
Yes. Make sure you're releasing your memory as soon as possible, e.g. with using (var ...) { }, and ensure the 3rd party code you use does the same.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Weka ThresholdSelector and CostSensitiveClassifier on stream learning - stream

Are Weka's ThresholdSelector and/or CostSensitiveClassifier compatible with stream learning (updatable classifiers) ? My goal is to use them with weka.classifiers.meta.MOA to focus learning on a specific class and minimize FN on some imbalanced data. Thanks a lot!

Related

How would one implement a doAfterNext operator in Project Reactor?

apache ignite datastreamer how to set data into ignitefuture?

Groovy/Grails promises/futures. There is no .resolve(1,2,3) method. Strange?

Zf2 MasterSlaveFeature different Slaves

Memory issue when using CGImage.ScreenImage in a loop using Mono Touch

Categories

Resources