Setup:
read from pubsub -> window of 30s -> group by user -> combine -> write to cloud datastore
Problem:
I'm seeing DataStoreIO writer errors as objects with similar keys are present in the same transaction.
Question:
I want to understand how my pipeline combines results into bundles after a group by/combine operation. I would expect the bundle to be created for every window after the combine. But apparently, a bundle can contain more than 2 occurrences of the same user?
Can re-execution (retries) of bundles cause this behavior?
Is this bundling dependent of the runner?
Is deduplication an option? if so, how would I best approach that?
Note that I'm not looking for a replacement for the datastore writer at the end of the pipeline, I already know that we can use a different strategy. I'm merely trying to understand how the bundling happens.
There are two answers to your question. One is specific to your use case, and the other is in general about bundling / windowing in streaming.
Specific to your pipeline
I am assuming that the 'key' for Datastore is the User ID? In that case, if you have events from the same user in more than one window, your GroupByKey or Combine operations will have one separate element for every pair of user+window.
So the question is: What are you trying to insert into datastore?
An individual user's resulting aggregate over all time? In that case, you'd need to use a Global Window.
A user's resulting aggregate for every 30 seconds in time? Then you need to use the window as part of the key you use to insert to datastore. Does that help / make sense?
Happy to help you design your pipeline to do what you want. Chat with me in the comments or via SO chat.
The larger question about bundling of data
Bundling strategies will vary by runner. In Dataflow, you should consider the following two factors:
Every worker is assigned a key range. Elements for the same key will be processed by the same worker.
Windows belong to single elements; but a bundle may contain elements from multiple windows. As an example, if the data freshness metric makes a big jump*, a number of windows may be triggered - and elements of the same key in different windows would be processed in the same bundle.
*- when can Data freshness jump suddenly? A stream with a single element with a very old timestamp, and that is very slow to process may hold the watermark for a long time. Once this element is processed, the watermark may jump a lot, to the next oldest element (Check out this lecture on watermarks ; )).
How do I stop MetaTrader Terminal 4 offline chart from updating the price on its own?
I want to update the price on my own because of the difference in timezone with my broker. I have checked all the properties and the MQL4 forum. No luck.
For truly offline-charts, there is a way
While regular charts process an independent event-flow, received from MT4-Server, there is a change for retaining your own control over TOHLCV-data records -- including the TimeZone shifts, synthetic Bar(s) additions and other adaptations, as needed.
You may create your own, transformed, TOHLCV-history and import these records via F2 facility, called in MT4 a History Centre.
How to avoid a live-quote-stream updates in MetaTrader Terminal 4
The simplest ever way is not to login to any Trading Server. This will avoid unwanted updates from reaching your local anFxQuoteStreamPROCESSOR.
There used to be a way, how to inject fake QuoteStreamDATA into a local MT4, however this enters a gray, if not black zone, as MetaQuotes, Inc., postulated the Server/Terminal protocol to be a protected IP and any attempt to reverse-engineer they consider an unlawfull violation of their rights and could cause legal consequences, so be carefull on stepping there. Anyway, a doable approach with an explicit risk warning being presented above.
Can't be done. Quotes get fed in from mt4 and get "evented" into the metatrader.
On each new bar/tick, my variable is re-initialed, I am trying to execute a trade once per signal, the problem is that once TP is achieved, if same trends continues, it triggers another trade. I am thinking to store variable in Text file. So just wondering what would the best way to handle such variable. Sorry I don't have code.
MT4 Global Variable objects
While MT4 supports somehow ghost-alike semi-persistent objects called "Global Variables", that can survive between MT4 Terminal re-runs for about several weeks, these ghosts are rather complicated to be used for your sketched purposes.
GlobalVariableCheck()
GlobalVariableSet()
GlobalVariableSetOnCondition()
GlobalVariableGet()
FileSystem Text-File
While doable, this ought be a last resort only option, as this is the slowest and the least manageable part, once running several units, several tens, several hundreds of MT4 Terminal instances in the same environment, the risk of fileIO collisions is clearly visible.
Solution?
Try to create & maintain a singleton-pattern to avoid multiple re-entries into a trend you have already put one trade into.
Try to setup also a clear definition for trend reversals, which stop / reset the singleton-pattern once a new trend was formed.
How can I make a 30-day trial for my application? I need to allow users to use an application only 30 days. How to count these days?
I keep the first and the last date in registry. But if to change a system time - no protection will be. I need to count these 30 days.
You could probably come up with a system that requires an internet connection, but without something that the user can't tamper with, I don't see a solution.
Any solutions that rely on an untrusted element (an element of the protection that is under the user's control) is critically weak.
The simplest way I can think of to protect against the user moving the clock back is to limit the total number of launches.
However, attempts to limit the number of launches requires persistence -- saving data to the disk, perhaps encrypting and storing a modified version of your activation data file -
Imagine that you count one of the 30 days as "used up" once the app has been launched, on a unique occasion, even when the same date is re-used. In order to avoid using up more than 1 "activation time day" when launched, the user must allow your software to re-save its activation file each time it runs.
To block that approach, the user needs only to keep the apparent date from changing, plus they must either prevent you from storing anything to disk; or they can simply track and record your changes and reverse them out, either using a monitoring process, or using VMWare snapshots. About VMWare snapshots, you can do nothing. The virtual machine's disk is not under your control.
You can protect your app of users setting the clock back simply by storing in the registry the date of last execution.
Each time the app is started you need to do the following:
Check current date (as reported by the system clock) against the stored last execution date and, if current date is earlier than the last execution one, consider that the trial period has expired (or whatever you prefer).
If the previous check is ok, save the current date in the registry and continue execution.
As WarrenP says, any technique storing information locally can be easily circumvented using VMware snapshots.
And anyone, including those who check via internet, can be skipped via assembler level hacking.
Here's a discussion on Shareware trial enforcement with Delphi:
Best Shareware lock for Delphi Win32
Along with discussions on various 3rd-party solutions, techniques for DIY, etc..
IMO, DIY is feasible if your app produces data that the user will want to keep around, then you can simply embed a copy of the usage/day counter in the database in such a way that they can't wipe it without destroying their data. I also like watermarking (printing "trial" on reports, etc..), escalating nag severity, but I do not recommend or condone "drop-dead" crippling until WAY past the expiration data. I also like to measure "days of actual use", instead of using a calendar.
Registry manipulation works, and many of the 3rd-party protectors use it. But you need to be stealthy, and keep backups in several locations simultaneously.
You should also consider having separate trial and registered versions. But also consider that pirates will buy the registered version with a stolen card, and put it on Rapidshare, BitTorrent, etc..
Also note that elaborate defenses lead to support headaches. Sometimes PCs crash and the clock gets set backwards. They install new harware. PCs get rebuilt, restored from backup, etc.. If a user is running a debugger, he may be a software developer, not a pirate. If your app looks like it has been patched, it may be an overly-aggressive antivirus. And at any time, a shoddy patch for Windows may cause your program to think that it's being attacked, hacked, or reverse-engineered. You have been warned...
Encrypt a date and store it in registry the best way to do it is that date to be stored by the installer itself and if the date doesn't exist the application should quit.
There is an open source project (which was a commercial product before):
TurboPower OnGuard is a library to create demo versions of your Borland Delphi & C++Builder applications. Create demo versions that are time-limited, feature-limited, limited to a certain number of uses, or limited to a certain # of concurrent network users.
I have not checked which Delphi versions are supported.
For this kind of "protection" and some others, I have used TmxProtector (open source) from MaxComponents in the past with good results. From the link provided:
The TmxProtector is a software protection component. It was designed
for quick implementation of application protection functions. You can
create time-trial and password protected applications. You can set the
maximum number of execution, and it can work with registration keys as
well.
This compoment uses very simple encryption to store the expiration date in the registry and it provides some simple detection for tampering on the system date.
It sounds like you need to store the date the last registry entry was written. Then inside your program, test if the current date is less than the date last registry entry was made. If true display a message that the trial period has expired and the program must be purchased.
Here are some ideas on how to deal with clock changes during the trial period:
Save both the date of first and the date of the last program start. If the date of the last program start is greater than the current date, then the user has moved the clock back. I simply increase a day and save the new date as the date of last start. You can of course decide to just end the trial.
To try to defeat trial bypass programs (RunAsDate for example) which run your application by setting the date and time to a specific value, you can instead of getting the date via the usual Delphi way (Date, Now), get, for example, the last modification date of NTUSER.DAT.
Save your trial data on two separate locations, either two registry locations, or file and registry. This way even if the user deletes one of the trial data locations, you'll still have a backup one to use.
If you keep your trial info in registry, the registry could be deleted by the user. Evey one expects to find the registration info there.
There is one place where the user might not think to look into: your own app (EXE file). Put an ANSI string constant (MUST by ansi/ascii or other 1 byte string, static array, etc) into your program, like 'xyxyxyxyxyxy'. Compile your app. Open your complied app with a hex editor. Search for that string. Now your program could use that area to store the trial info into itself.
Use this method in conjunction with others: store your info in registry also, on disk, etc.
Anyway, the best would be to get the registration info from your server.
The big drawback: 1. The server must be ALWAYS online! 2. The user must be connected to internet (when it uses your app).
Also use a Delphi license management library to help you encrypt the license info and generate a string-based key that you can send to your customers upon registration.
Anyway, whatever you send to your server needs to be based on the hardware fingerprint of that computer. Otherwise your license key will leak out on some warez website and everyone will be able to use that key. But if the key is hardware-based it would be useless if it is leaked on Internet.
Just remember: don't over do it! There is no such thing as unbreakable software protection. Microsoft could not do it!
As the thread pointed to mentioned, I encourage you to look into WinLicense: http://www.oreans.com.
I've been using it for quite some time and it handles trial periods quite well. It also handles licensing, customer lists, etc.
Tom
Suppose we are building an E-commerce site that allows consumers to search for products by typing in keywords. Say there are at most 200,000 products, and there are millions of consumers using the system. Let’s say the product table is updated fairly frequently. Since the number of products is not that high and we can probably store the entire product table in memory and search against it instead of hitting the database. We are hoping to create distributed caches that store the same data but reside in different servers (for high availability and performance reason) and we need to be able to synchronize data among these caches and invalidate caches when product table is modified.
Our application is built using ASP.NET MVC and NHibernate. I am trying to understand whether NHibernate’s level-2 caching would help with my situation. I would really appreciate if you guys can shed some light on this.
I understand that level-2 caching will help cache query result so if two different users are searching using the same keyword, the L2 Cache will serve the result from the cache instead of from the database. But it doesn’t help us much since the product table is updated frequently and the cached result will be stale.
My question is am I understanding L2 caching correctly and is there exists anything that help manage cache the way I would like to do (multiple caches, the same data, synchronize between cache and invalidate cache). Any thoughts is highly appreciated.
Having used both the second-level cache (using the memcached provider) and the NHibernate.Search add-on it seems to me you could benefit from both.
The NHibernate.Search component depends on Lucene.Net and keyword search is decoupled from the Database it self. A different index file is created per class mapped and optimizations can be set on the property level using attributes, giving you an extra level of granularity. Additionally, you can implement best match and propositions (check Lucene in Action and/or Hibernate Search in action). As a note, you don't have to maintain the index (unless you explicitly request an index rebuild); the implementation manages everything behind the scenes although you can manipulate the index if you wish to do so. So, adding/deleting/updating a product will automatically update the according index.
For the second-level cache you get instant performance boost. On a test environment with a data set of approx 2 mil rows i had more than 20% improvement even on an extremely low request count. The performance boost is gradually larger as the request count increases - the application first hits the 2nd level cache and if it does not find it then hits the DB to fetch the required rows and inserts them on the cache for future queries. Again you can manage stuff like cache duration and other configuration settings, as well as explicitly clear the cache (all of it, a part of it, or particular entries) if you wish to do so. Note that cache state is managed by the application during save/update/delete.
For scallability
* the 2nd level cache depends on the provider (ie memcached is highly performant and scalable and supports distributed instances).
* for the Lucene.Net/NHibernate.Search you will need to set up a specific place that the indexes will reside and that place must be accessible for read/write by all web-application instances. Note here that the sensitive link is I/O and file contention, so setting up a machine with a faster than light file system will prevent that from happening (i am speaking for your scenario with many thousands of search requests per second)
As a side note i would highly recommend NHibernate.Search since it is extremely faster than LIKE queries and is easier to use than implementing SQL-Server's FullText search inside the application (which i have done).
Whether a second level cache will help depends on exactly how frequently your product table is updated in relation to cache hits. If you add 100 new products an hour but receive 10,000 queries an hour, even a 10% cache hit rate will make a big difference. If the rates are reversed, a second level cache will be of almost no value.
I suggest you set up a stress test environment that closely approximates your production environment and perform benchmarking on the various second level cache providers.
Also check that your DB is configured properly for an update-heavy scenario.
I recommend using NHibernate.Search w/ Lucene. It works together with the 2nd level cache. Lucene can do sophisticated text searching ripping fast and then return back the entity keys to NHibernate which pulls the full entity out of its 2nd level cache. The NHibernate.Search extension does the work of keeping your Lucene index in sync.
TekPub did a recent episode on your exact scenario of searching product descriptions. The episode compares NHibernate queries, SQL Full-text indexing and Lucene w/ NHibernate.Search.