Tracking/Monitoring sudden trend changes - monitoring

I track alot of things with RRD, eg, uptime, network throughput, etc. This works well when you can fit all the graphs on a single page, however, once you scale beyond a page it becomes difficult to use graphs to catch issues, you need to look at them to see that there is an issue, and if there is hundreds or thousands of graphs, that obviously isn't possible.
So, is there any standard way, or existing software for monitoring rrd databases for trend changes? Eg, every day, network traffic looks pretty much the same, if it spikes or dips dramatically in a single hour/day/week compared to the norm, I'd like to be alerted to it.
Or even just generic methods for finding changes in trends.

You can read the RRD file directly, not just use the graphs generated. You might need to write your own app to do this, but the file format is an open standard so shouldn't be that difficult to get what you need.
RRD File Format

Looks like RRD actually supports this,
http://cricket.sourceforge.net/aberrant/rrd_hw.htm
Would be interested in hearing if anyone has used this.

Related

How to use neo4j effectively for serious, repeatable analysis over time

New to neo4J and love the browser for exploratory work. But, I'm unsure of how to best use it to achieve, for lack of a better term, real work. Consider a sample project involving:
Importing 4 different CSV files
Creating appropriate relationships between nodes
Doing a variety of complex queries to derive data that I'll export for statistical analysis using another program.
I need to be able to replicate the project in the future, as well as adding new data, calculating different derived data, etc. I also need to be able to share the code so others can extend/verify it.
For non-relational data, I'd use something like R, Stata or SAS. While each allow interactive exploration like the neo4J browser, I'd never use that for serious analysis. Instead, I'd save a file or files of commands that I could modify and rerun whenever I needed to.
Neo4j's browser doesn't seem to support any of this functionality. Unless I am missing something, it doesn't even allow one to save a "session" along the lines of a iPython/Jupyter notebook. I know that there is a neo4-shell, but especially since they have dropped it from the standard desktop installation (and gotten rid of the console), I feel like I must be doing something wrong--or at least contrary to the designers' intent--if I can't do serious work in the browser. Clearly, lots of people are.
Can anyone point me in the right direction? How does one best develop an extensive, replicable project over time with neo4j? Thank you.
You can take your pick of several officially-supported language drivers to integrate neo4j into basically any other project structure, including Jupyter notebooks. I'm not sure what exactly you mean by "serious work", or where you got the idea that people did lots of it in the browser, but you are definitely able to save the results of a query from the browser in a variety of formats (pictures of the bubbles, result rows in a CSV, JSON response) if your prefer to work that way, or you can pipe data very efficiently into another language and manage it there. I don't see why they would re-create presentation and/or project management tools when there are already so many good ones out there.

Long distance OSM routing, how to work with all that data?

I am trying to build my own routing system which utilizes OSMSharp, and will eventually have a full website front end deployed to Azure. However, I think I have a serious problem if I want to find a route over a long distance (e.g. NY -> CA). It looks like the routers in OSMSharp just accept a Stream of osm data, however even the binary format (.osm.pbf) will be roughly 10gb of data. Which seems like a huge performance concern.
Either, I need to hold that huge file in memory, and who knows how much Azure is going to charge me for that, or how well OSMSharp/the CLR is going to handle it; or it needs to be broken up and stored in a DB for on-the-fly loading.
Can anyone give any insight into how this is usually handled? Am I way out of my league for a personal project? Maybe I should support just one US State?
Directly processing a pbf file will be very inefficient because it just contains raw data. This file format is not optimized for running queries on it. You need to pre-process this file, calculate a routing graph, drop uninteresting data and then store it in some kind of database or a similar efficient format.
For really long distances consider using contraction hierarchies. They are used by many popular OSM routers, such as graphhopper and OSRM.
It also helps to take a look at the various different online routers and offline routers for OSM in order to get some ideas.

Where should computations take place for complex algorithms

Background:
I'm a software engineering student and I was checking out several algorithms for recommendation systems. One of these algorithms, a collaborative filtering has a lot of loops int it, it has to go through all of the users and for each user all of the ratings he has made on movies, or other rateable items.
I was thinking of implementing it on ruby for a rails app.
The point is there is a lot of data to be processed so:
Should this be done in the database? using regular queries? using PL/SQL or something similar (Testing dbs is extremely time consuming and hard, specially for these kind of algorithms )
Should I do a background job that caches the results of the algorithm? (If so the data is processed on memory and if there are millions of users, how well does this scale)
Should I run the algorithm every time there is a request or every x requests? (Again, the data is processed in memory)
The Question:
I know there are things that do this like Apache Mahout but they rely on Hadoop for scaling. Is there another way out? is there a Mahout or Machine Learning equivalent for ruby and if so how where does the computation take place?
Here is my thoughts on each of the methods:
No it should not. Some calculations would be much faster to run in your database and some would not. However it would be hard and time consuming to test exactly which calculations that should be runned in your db, and you would properly experience that some part of the algorithm is slow in postgreSQL or whatever you use.
More importantly: this is not the right place to run logic, as you say yourself, it would be hard to test and it's overall a bad practice. It would also affect the performance of your requests overall each time the db have to calculate the algorithm. Also the db would still use a lot of memory processing this so that isn't a advantage.
By far the best solution. See below for more explanation.
This is a much better solution than number one. However this would mean that your apps performance would be very unstable. Some times all resources would be free for normal requests, and some times you would use all your resources on you calculations.
Option 2 is the best solution, as this doesn't interfere with the performance of the rest off your app and is much easier to scale as it works in isolation. If for example you experience that your worker can't keep up, you can just add some more running processes.
More importantly you would be able to run the background processes on a separate server and thereby easily monitor the memory and resource usage, and scale your server as necessary.
Even for real time updates a background job will be the best solution (if of course the calculation is not small enough to be done in the request). You could create a "high priority" queue that has enough resources to almost always be empty. If you need to show the result to the user with a reload, you would have to add some kind of push notification after a background job is complete. This notification could then trigger an update on the page through javascript (you can also check out the new live stream function of rails 4).
I would recommend something like Sidekiq with Redis. You could then cache the results in memcache or you could recalculate the result each time, that really depends on how often you would need to calculate this. With this solution, however, it would be much easier to setup a stable cache if you want it.
Where I work, we have an application that runs some heavy queries with a lot of calculations like this. Each night these jobs are queued and then run on a isolated server over the next few hours. This scales really well, and is also easy to monitor with new relic.
Hope this helps, and makes sense (I know my English isn't perfect), but please feel free to ask if I misunderstood something or you have more questions.

Will it be more cost efficient to use DSP/FPGA, instead of x86, to add simple watermark on JPEG/GIF/... files?

My current project is to add simple logo and some sentences on the given graphic files. Each file is less than 500k on average. I am going to put the service online. So it should be able to handle 50requests per second. Our current budget is limited. Any suggestion?
Start with OS-based (x86) image modification. Test the performance. If it's not up to snuff, then you can evaluate other approaches.
While an FPGA- or DSP-based approach may be faster, it's harder to find preexisting modules or libraries. You'll end up doing more work for gains that may not be worthwhile.
You should be able to get simple watermarking implemented very quickly, so you can get some performance numbers before all your functionality is present.

Low level programming: How to find data in a memory of another running process?

I am trying to write a statistics tool for a game by extracting values from game's process memory (as there is no other way). The biggest challenge is to find out required addresses that store data I am interested. What makes it even more harder is dynamic memory allocation - I need to find not only addresses that store data but also pointers to those memory blocks, because addresses are changing every time game restarts.
For now I am just manually searching game memory using memory editor (ArtMoney), and looking for addresses that change their values as data changes (or don't change). After address is found I am looking for a pointer that points to this memory block in a similar way.
I wonder what techniques/tools exist for such tasks? Maybe there are some articles I can read? Is mastering disassembler the only way to go? For example game trainers are solving similar tasks, but they make them in days and I am struggling already for weeks.
Thanks.
PS. It's all under windows.
Is mastering disassembler the only way to go?
Yes; go download WinDbg from http://www.microsoft.com/whdc/devtools/debugging/default.mspx, or if you've got some money to blow, IDA Pro is probably the best tool for doing this
If you know how to code in C, it is easy to search for memory values. If you don't know C, this page might point you to your solution if you can code in C#. It will not be hard to port the C# they have to Java.
You might take a look at DynInst (Dynamic Instrumentation). In particular, look at the Dynamic Probe Class Library (DPCL). These tools will let you attach to running processes via the debugger interface and insert your own instrumentation (via special probe classes) into them while they're running. You could probably use this to instrument the routines that access your data structures and trace when the values you're interested in are created or modified.
You might have an easier time doing it this way than doing everything manually. There are a bunch of papers on those pages you can look at to see how other people built similar tools, too.
I believe the Windows support is maintained, but I have not used it myself.

Resources