I throttle events for all sinks registered in a Serilog logger: only 5 events per minute are allowed to be emitted by any sink.
I need a record (a Serilog event) in all the sinks that the throttling actually took place, otherwise I don't know if the exact max. allowed amount of events has been logged per minute or more events have been throttled.
I use my own throttle approach similar to the "official" PoC. I create an ILogEventFilter, and in its IsEnabled method I check the current rate of the events and want to write my "throttling happened" event to all the sinks directly (otherwise, e.g. if I write this event via the logger, I'm in endless recursion):
public class EventFilter : ILogEventFilter
{
public bool IsEnabled(LogEvent logEvent)
{
var enabled = ThrottledInvoker<string>.Invoke( // internally counts the calls, the method signature is simplified for the sake of this code sample
() => // method to call if throttling takes place
{
// send additional events/records into the sinks about throttling
var throttlingInfoEvent = new LogEvent(
logEvent.Timestamp,
logEvent.Level,
null,
MessageTemplate.Empty,
new[] {
new LogEventProperty("CustomMessage", new ScalarValue("Too many events of this kind.")),
});
// TODO: how to write to registered sinks?
sink1.Emit(throttlingInfoEvent);
sink2.Emit(throttlingInfoEvent);
// ...
// sinkN.Emit(throttlingInfoEvent);
},
TimeSpan.FromMinutes(1), // time period
5); // max. allowed event count per time period
return enabled;
}
}
I have found no Logger.Sinks property or anything alike. I understand that I can use reflection (issue on GitHub) to get private _sinks fields of my logger's aggregated sinks, but that smells.
I could initialize my filter before passing it to LoggerConfiguration.Filter.With() method, and pass all the sinks as my filter's constructor arguments. But the logger and sinks are configured by another component of the application which I don't have access to.
Is there a right/official way to do what I want?
Related
Apache Beam has recently introduced state cells, through StateSpec and the #StateId annotation, with partial support in Apache Flink and Google Cloud Dataflow.
I cannot find any documentation on what happens when this is used with a GlobalWindow. In particular, is there a way to have a "state garbage collection" mechanism to get rid of states for keys that have not been seen for a while according to some configuration, while still maintaining a single all-time state for keys are that seen frequently enough?
Or, is the amount of state used in this case going to diverge, with no way to ever reclaim state corresponding to keys that have not been seen in a while?
I am also interested in whether a potential solution would be supported in either Apache Flink or Google Cloud Dataflow.
Flink and direct runners seem to have some code for "state GC" but I am not really sure what it does and whether it is relevant when using a global window.
State can be automatically garbage collected by a Beam runner at some point after a window expires - when the input watermark exceeds the end of the window by the allowed lateness, so all further input is droppable. The exact details depend on the runner.
As you correctly determined, the Global window may never expire. Then this automatic collection of state will not be invoked. For bounded data, including drain scenarios, it actually will expire, but for a perpetual unbounded data source it will not.
If you are doing stateful processing on such data in the Global window you can use user-defined timers (used through #TimerId, #OnTimer, and TimerSpec - I haven't blogged about these yet) to clear state after some timeout of your choosing. If the state represents an aggregation of some sort, then you'll want a timer anyhow to make sure your data is not stranded in state.
Here is a quick example of their use:
new DoFn<Foo, Baz>() {
private static final String MY_TIMER = "my-timer";
private static final String MY_STATE = "my-state";
#StateId(MY_STATE)
private final StateSpec<ValueState<Bizzle>> =
StateSpec.value(Bizzle.coder());
#TimerId(MY_TIMER)
private final TimerSpec myTimer =
TimerSpecs.timer(TimeDomain.EVENT_TIME);
#ProcessElement
public void process(
ProcessContext c,
#StateId(MY_STATE) ValueState<Bizzle> bizzleState,
#TimerId(MY_TIMER) Timer myTimer) {
bizzleState.write(...);
myTimer.setForNowPlus(...);
}
#OnTimer(MY_TIMER)
public void onMyTimer(
OnTimerContext context,
#StateId(MY_STATE) ValueState<Bizzle> bizzleState) {
context.output(... bizzleState.read() ...);
bizzleState.clear();
}
}
There is not automatic garbage collection of state if you use GlobalWindows. Only if you use some non-global window will state be garbage collected after the watermark passes the end of a window plus the allowed lateness.
What you can do if you must work with GlobalWindows is to manually keep as state the last update timestamp. Then you would periodically set a timer where you check this timestamp against the current time and delete state if necessary. You would set this timer when encountering a key for the first time (which you can see from the absence of your timestamp state) and then re-set it in the #OnTimer method.
Over the last couple of years, I have done a fair amount of work on Amazon SWF, but the following points are still unclear to me and I am not able to find any straight forward answers on any forums yet.
These are pretty basic requirements I suppose, sure others might have come across too. Would be great if someone can clarify these.
Is there a simple way to return a workflow execution result (maybe just something as simple as boolean) back to workflow starter?
Is there a way to catch Activity timeout exception, so that we can do run customised actions in such scenarios?
Why doesn't WorkflowExecutionHistory contains Activities, why just Events?
Why there is no simple way of restarting a workflow from the point it failed?
I am considering to use SWF for more business processes at my workplace, but these limitations/doubts are holding me back!
FINAL WORKING SOLUTION
public class ReturnResultActivityImpl implements ReturnResultActivity {
SettableFuture future;
public ReturnResultActivityImpl() {
}
public ReturnResultActivityImpl(SettableFuture future) {
this.future = future;
}
public void returnResult(WorkflowResult workflowResult) {
System.out.print("Marking future as Completed");
future.set(workflowResult);
}
}
public class WorkflowResult {
public WorkflowResult(boolean s, String n) {
this.success = s;
this.note = n;
}
private boolean success;
private String note;
}
public class WorkflowStarter {
#Autowired
ReturnResultActivityClient returnResultActivityClient;
#Autowired
DummyWorkflowClientExternalFactory dummyWorkflowClientExternalFactory;
#Autowired
AmazonSimpleWorkflowClient swfClient;
String domain = "test-domain;
boolean isRegister = true;
int days = 7;
int terminationTimeoutSeconds = 5000;
int threadPollCount = 2;
int taskExecutorThreadCount = 4;
public String testWorkflow() throws Exception {
SettableFuture<WorkflowResult> workflowResultFuture = SettableFuture.create();
String taskListName = "testTaskList-" + RandomStringUtils.randomAlphabetic(8);
ReturnResultActivity activity = new ReturnResultActivityImpl(workflowResultFuture);
SpringActivityWorker activityWorker = buildReturnResultActivityWorker(taskListName, Arrays.asList(activity));
DummyWorkflowClientExternalFactory factory = new DummyWorkflowClientExternalFactoryImpl(swfClient, domain);
factory.getClient().doSomething(taskListName)
WorkflowResult result = workflowResultSettableFuture.get(20, TimeUnit.SECONDS);
return "Call result note - " + result.getNote();
}
public SpringActivityWorker buildReturnResultActivityWorker(String taskListName, List activityImplementations)
throws Exception {
return setupActivityWorker(swfClient, domain, taskListName, isRegister, days, activityImplementations,
terminationTimeoutSeconds, threadPollCount, taskExecutorThreadCount);
}
}
public class Workflow {
#Autowired
private DummyActivityClient dummyActivityClient;
#Autowired
private ReturnResultActivityClient returnResultActivityClient;
#Override
public void doSomething(final String resultActivityTaskListName) {
Promise<Void> activityPromise = dummyActivityClient.dummyActivity();
returnResult(resultActivityTaskListName, activityPromise);
}
#Asynchronous
private void returnResult(final String taskListname, Promise waitFor) {
ActivitySchedulingOptions schedulingOptions = new ActivitySchedulingOptions();
schedulingOptions.setTaskList(taskListname);
WorkflowResult result = new WorkflowResult(true,"All successful");
returnResultActivityClient.returnResult(result, schedulingOptions);
}
}
The standard pattern is to host a special activity in the workflow starter process that is used to deliver the result. Use a process specific task list to make sure that it is routed to a correct instance of the starter. Here are the steps to implement it:
Define an activity to receive the result. For example "returnResultActivity". Make this activity implementation to complete the Future passed to its constructor upon execution.
When the workflow is started it receives "resultActivityTaskList" as an input argument. At the end the workflow calls this activity with a workflow result. The activity is scheduled on the passed task list.
The workflow starter creates an ActivityWorker and an instance of a Future. Then it creates an instance of "returnResultActivity" with that future as a constructor parameter.
Then it registers the activity instance with the activity worker and configures it to poll on a randomly generated task list name. Then it calls "start workflow execution" passing the generated task list name as an input argument.
Then it wait on the Future to complete. The future.get() is going to return the workflow result.
Yes, if you are using the AWS Flow Framework a timeout exception is thrown when activity is timed out. If you are not using the Flow framework than you are making your life 100 times harder. BTW the workflow timeout is thrown into a parent workflow as a timeout exception as well. It is not possible to catch a workflow timeout exception from within the timing out instance itself. In this case it is recommended to not rely on workflow timeout, but just create a timer that would fire and notify workflow logic that some business event has timed out.
Because a single activity execution has multiple events associated to it. It should be pretty easy to write code that converts history to whatever representation of activities you like. Such code would just match the events that relate to each activities. Each event always has a reference to the related events, so it is easy to roll them up into higher level representation.
Unfortunately there is no easy answer to this one. Ideally SWF would support restarting workflow by copying its history up to the failure point. But it is not supported. I personally believe that workflow should be written in a way that it never fails but always deals with failures without failing. Obviously it doesn't work in case of failures due to unexpected conditions. In this case writing workflow in a way that it can be restarted from the beginning is the simplest approach.
Is it possible to have the DataFlow process maintain the state. There are log processing tools that allow for that by providing fast access (propriety / in-memory) files available for the real time process to keep track of the state on the logs while processing them.
A use case example would be with tracking registration steps taken by users. The registration steps would come in different logs and the data form those logs would be assembled by the real time process into one final database record (for each registered user) that is written to a database.
Can my DataFLow code keep track of the many registration steps (streaming input) by users and once user's registration steps are completed then have the DataFLow process write the records to the database (one record per user).
I don't know much about DataFlow architecture. It must be using some (proprietary / in-memory nosql) data storage for keeping track of things it needs to keep track of (ex. when it tries to produce top 100 customers). Is that fast access data storage also available to the DataFlow processes to use?
Thanks
As danielm said, state is not yet exposed. The good news is you may not need it for your use case.
If you have a PCollection<KV<UserId, LogEvent>> you can use a CombineFn and Combine.perKey to take all of the LogEvents for a specific UserId and combine them into a single output. The CombineFn tells Dataflow how to create an accumulator, update it by incorporating input elements, and then extract a final output. Transforms like Top actually use a CombineFn (with a Heap as the accumulator) rather than an actual state API.
If your events are of different types, you can still do something like this. For instance, if you have two logs, you can do:
PCollection<KV<UserId, LogEvent1>> events1 = ...;
PCollection<KV<UserId, LogEvent2>> events2 = ...;
// Create tuple tags for the value types in each collection.
final TupleTag<LogEvent1> tag1 = new TupleTag<LogEvent1>();
final TupleTag<LogEvent2> tag2 = new TupleTag<LogEvent2>();
//Merge collection values into a CoGbkResult collection
PCollection<KV<UserIf, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(tag1, pt1)
.and(tag2, pt2)
.apply(CoGroupByKey.<UserId>create());
// Access results and do something.
PCollection<T> finalResultCollection =
coGbkResultCollection.apply(ParDo.of(
new DoFn<KV<K, CoGbkResult>, T>() {
#Override
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
// Get all LogEvent1 values
Iterable<LogEvent1> event1s = e.getValue().getAll(tag1);
// There will only be one LogEvent2
LogEvent2 event2 = e.getValue().getOnly(tag2);
... Do Something to compute T ....
c.output(...some T...);
}
}));
The above example was adapted from docs on CoGroupByKey which have information.
Dataflow does not currently expose the underlying state mechanism that it uses. However, this is definitely on the radar for a future update.
I have a need to manage a global state of a number of web components on my web page. (e.g. Each web component has a "select" button/function and I track components to make sure only one component is selected at a time.)
To manage the global state of my components, each web component provides a stream of events to a common handler in my main web app. Unfortunately, I need my handler to know which stream/web-component it was called from in order to manage the global state. How can my handler get this information?
Here is my sample code:
// _webComponents is a list of references to each component.
// getStream() gets a stream of events from each component.
// connections is a list of streams from all my web components.
_webComponents.forEach((connection)=>connections.add(connection.getStream()));
connections.forEach((Stream<String> connection)=>connection.listen(eventHandler));
void eventHandler(webComponentEvent){
// ?? How do i find out which web component the event came from ??
// ToDo: use event and web component info to manage a global state of web components.
}
If I understood correctly, you want to know the sender in your handler, right?
There are two options. The first one is to send the sender as a part of data:
class Dog { // controller and stream initialization removed for brevity
Stream get onBark => ...;
void bark(){
// of course, you can have a typed object instead of map
_barkController.add({'sender': this, 'data': 'woof'});
}
}
// attach to source
var dog = new Dog();
dog.onBark.listen((event) {
sender = event['sender'];
data = event['data'];
...
});
Another option is to bind the sender in the closure. This doesn't require you to change the type of stream (so you will still have Stream<String> instead of Stream<Map>:
sources.forEach((src) => src.listen((data) => handleEvent(src, data)));
void handleEvent(Connection sender, String data) {
...
}
I have class like this :
class BaseModel {
Map objects;
// define constructor here
fetch() {
// fetch json from server and then load it to objects
// emits an event here
}
}
Like backbonejs i want to emits a change event when i call fetch and create a listener for change event on my view.
But from reading the documentation, i don't know where to start since there are so many that points to event, like Event Events EventSource and so on.
Can you guys give me a hint?
I am assuming you want to emit events that do not require the presence of dart:html library.
You can use the Streams API to expose a stream of events for others to listen for and handle. Here is an example:
import 'dart:async';
class BaseModel {
Map objects;
StreamController fetchDoneController = new StreamController.broadcast();
// define constructor here
fetch() {
// fetch json from server and then load it to objects
// emits an event here
fetchDoneController.add("all done"); // send an arbitrary event
}
Stream get fetchDone => fetchDoneController.stream;
}
Then, over in your app:
main() {
var model = new BaseModel();
model.fetchDone.listen((_) => doCoolStuff(model));
}
Using the native Streams API is nice because it means you don't need the browser in order to test your application.
If you are required to emit a custom HTML event, you can see this answer: https://stackoverflow.com/a/13902121/123471
There's a package for it:
https://pub.dev/packages/event
This will be better than using Streams as 'event' is more readable