parse an active log file - parsing

Looking for a little help getting started on a little project i've had in the back of my mind for a while.
I have log file(s) varying in size depending on how often they are cleaned from 50-500MB. I'd like to write a program that will monitor the log file while its actively being written to. when in use it's being changed pretty quickly easily several hundred lines a second or so. Most if not all of the examples i've seen for reading log/text files are simply open and read file contents into a variable which isn't really feasible to do every time the file changes in this situation. I've not settled on a language to write this in but its on a windows box and I can work in .net flavors / java / or php ( heh dont think php will fly to well for this), and can likely muddle through another language if someone has a suggestion for something well built for handling this.
Essentially I believe what I'm looking for would probably be better described to as a high speed way of monitoring a text file for changes and seeing what those changes are. Each line written is relatively small. (less than 300 characters, so its not big data on each line).
EDIT: to change the wording to hopefully better describe what i'm trying to do. Which is write a program to keep an eye on a log file for a trigger then match a following action to that trigger. So my question here is pertaining to file handling inside a programming language.
I greatly appreciate any thoughts/comments.

If it's incremental then you can just read the whole file the first time you start analyzing logs, then you keep the current size as n. Next time you check (maybe a timed action to check last modified date) just skip first n bytes, read all new bytes and update size.
Otherwise you could use tail -f by getting its stdout and using it for your purposes..

The 'keep an eye on a log file' part of what you are describing is what tail does.
If you plan to implement it in Java, you can check this question: Java IO implementation of unix/linux "tail -f" and add your trigger logic to lines read.

I suggest not reinventing the wheel.
Try using the elastic.co
All of these applications are open source and free and are capable of monitoring (together) and trigger actions based on input.
filebeats - will read the log file line by line (supports multiline log messages as well) and will send it across to logstash. There are loads of other shippers you can use.
logstash - will take the log messages, filter them, add tags and send the messages to elasticsearch
elasticsearch - will take the log messages and index them, the store them. It is also capable of running actions based on input
kibana - is a user friendly web interface to query and analyze the data. Or just simply put it up on a dashboard.
Hope this helps.

Related

Move file after Pipeline has run

Is it possible to move a file in GCS after the dataflow pipeline has finished running? If so, how? Should be the last .apply? I can't imagine that being the case.
The case here is that we are importing a lot of .csv's from a client. We need to keep those CSV's indefinitely, so we either need to "mark the CSV as being already handled", or alternatively, move them out of the initial folder that TextIO uses to find the csv's. The only thing I can currently think of is storing the file name (I'm not sure how I'd get this even, I'm a DF newbie) in BigQuery perhaps, and then excluding files that have already been stored from the execution pipeline somehow? But there has to be a better approach.
Is this possible? What should I check out?
Thanks for any help!
You can try using BlockingDataflowPipelineRunner and run arbitrary logic in your main program after p.run() (it will wait for the pipeline to finish).
See Specifying Execution Parameters, specifically the section "Blocking execution".
However, in general, it seems that you really want a continuously running pipeline that watches the directory with CSV files and imports new files as they appear, never importing the same file twice. This would be a great case for a streaming pipeline: you could write a custom UnboundedSource (see also Custom Sources and Sinks) that would watch a directory and return filenames in it (i.e. the T would probably be String or GcsPath):
p.apply(Read.from(new DirectoryWatcherSource(directory)))
.apply(ParDo.of(new ReadCSVFileByName()))
.apply(the rest of your pipeline)
where DirectoryWatcherSource is your UnboundedSource, and ReadCSVFileByName is also a transform you'll need to write that takes a file path and reads it as a CSV file, returning the records in it (unfortunately right now you cannot use transforms like TextIO.Read in the middle of a pipeline, only at the beginning - we're working on fixing this).
It may be somewhat tricky, and as I said we have some features in the works to make it a lot simpler and we're considering creating a built-in source like that, but it's possible that for now this would still be easier than "pinball jobs". Please give it a try and let us know at dataflow-feedback#google.com if anything is unclear!
Meanwhile, you can also store information about which files you have or haven't processed in Cloud Bigtable - it'd be a better fit for that than BigQuery, because it's more suited for random writes and lookups, while BigQuery is more suited for large bulk writes and queries over the full dataset.

How can I use an online code compiler's output through in a code compiler app?

I want to make an app which compiles Swift code, so how can I use a website named www.swiftstub.com or any similar website to retrieve the output of the code? I want my app to have a simple UITextView in which the user can type the code. If a UITextView cannot be used, what can be used?
I want my app to send the code typed to this website, and then retrieve the output back and display it. How can this be done? Thanks!
Make an app that sends text to a server.
Have the server compile this code, run it, and capture the output.
Send this output back to the app.
Have the app display the output.
This is an extreme oversimplification, there are lots of things to consider, but these are kind of the big sections of whats there to to. You need to understand very well on the app side: UI and networking, and on the server side: triggering a compilation of text, capturing that output (command line, maybe Swift or Python can help you here) and HTTP(S) responses and requests (or sockets, even harder).
This does not sound like an easy task, so you are very courageous.
Build all components locally:
App:
- write an app with a textView and a button, and on the tap of that button, save the output to a textile. This is to avoid any networking complications at this point, later, instead of saving, you would send this to a server.
Server: (just build the things on your computer)
- write some script/program that can read in that textfile that was saved.
- Then you need to compile this code (lookup 'xcrun' on google) and capture the output. Save this output to a textile. Have your app load this file and display it.
The important thing to consider is the real server machine you will run this code on later: it has to be a machine that can compile, and execute Swift code. Currently, this means it has to be an OS X machine. This is hard to find, as most servers run linux, and there is no Linux Swift compiler yet.
Getting this to work would be a proof of concept: you can capture text from the app, you can grab this text and compile it, you can capture the output of the compilation, and you can have the app read that output and display it.
Once you've got this working, you would need to find a server that can do the compilation part, and run what you build to do that. Then you would need to write some code in your app that sends an HTTP request to your server containing this text, to which your server would respond with the output of that compilation.
As I said, this is a big undertaking, with lots of difficult parts and unexpected surprises, so don't expect this to be done in a couple of weeks, it will most likely be more like over six months.
Try to find someone who has experience on programming and setting up a server, that will really help you a lot.

Specflow Feature File Best Practice

Thanks in advance for the help.
My question pertains to best practices inside a SpecFlow feature file?
Question:
Is using a wait command inside of the feature file considered bad practice.
Example:
And i click on the username
And wait 5 seconds
And i input new value into last name
The wait command forces a 5 second wait. I am doing this to make sure the page is loaded to prevent "element not found" errors or other errors. Basically to make sure I have a clean page to manipulate.
Would a better practice be to use a wait inside of the Step file itself?
//using Fluent Automation
I.WaitUntil(() => ());
//or
I.Wait(); //timespan
My reasoning for not using the Fluent Automation wait is:
By utilizing the Fluent Automation method you are dependent on the default timeout in the Settings object. The default timeout in some cases may not be long enough or may be to long. Seems very verbose to me to continually change/reset the Settings object with the only benefit being to remove wait commands from the feature file.
So what is really the best practice?
Thanks,
-n
I think the best practice is to keep the feature file for your scenarios, and free of the implementation details.
Since we are following a BDD process (http://dannorth.net/introducing-bdd) then the feature file is the output of that conversation between you and the process expert, and the scenario represents the steps that you are going to take to prove that your functionality works for that example. You could hope that those steps define the business process and could be performed by any system, not just the one we might be developing now. Ideally this logic captures our intent and can be reused on any future systems that might replace the current one.
So I just don't see you saying that you need to wait
....
Although you might want to say
When the page has loaded
and that maps quite nicely onto the fluent automation.

RnR: Long running process

I have a part of my application that creates an export file. The export file process is fairly quick for the vast majority of users however, there are users that generate 10,000 or more records. This complicates things. First, the tool that imports the files, blows up on files larger than about 4,000 records. Secondly, the process for 10,000 records takes about 20 minutes. There has a tendency for the users to start doing other things and then for what ever reason, the process seems to time out and they never get their file. However, if you click the process button, and just leave your machine alone, 20 minutes later you will get the file.
I need to make this more user-friendly and robust. Here's my ideas:
1) automatically create separate files of 4,000 a pop
2) provide a status bar for the file generation
3) background the process so a user can click the button and come back say an hour later and download their files
So I have been doing research on the background plugins and gems. Most seem to be fairly out of date, which make me nervous and may seem to be major overkill for what I need. So Spawn seemed to be simple and straight forward but I'm unclear on how to do a status bar for that type of product.
Then we have something like Delayed_job. This seems like it would work but also seems a little heavy but it does provide the hooks to generate some kind of status update. Anyone have an example of this? The README is a little light.
Another issue is the file generation, how do I get this multiple files to download? Anyway, I can store the generated file for the live of the user session?
Finally, most of the solutions are looking like a major change, this issue is painful but technically works. So the time that I am being allotted to solve it is minimal so I am trying to KISS. Thanks for any help and or direction you can provide.
If your looking for background processing job I guess you must look for resque it supereasy run on redis as against delayed_job which poll your databases changes
as per gathering progress info I guess there bunch of resque plugin here one that can help you in the quest
Lastly
Another issue is the file generation, how do I get this multiple files to download? Anyway, I can store the generated file for the live of the user session?
Not sure what you actually meant but if you wanted multiple file to download can zipping into one can help

How can I keep a large amount of OutputDebugString() calls from degrading my application in the Delphi 6 IDE?

This has happened to me on more than one occasion and has led to many lost hours chasing a ghost. As typical, when I am debugging some really difficult timing-related code I start adding tons of OutputDebugString() calls, so I can get a good picture of the sequence of related operations. The problem is, the Delphi 6 IDE seems to be able to only handle that situation for so long. I'll use a concrete example I just went through to avoid generalities (as much as possible).
I spent several days debugging my inter-thread semaphore locking code along with my DirectShow timestamp calculation code that was causing some deeply frustrating problems. After having eliminated every bug I could think of, I still was having a problem with Skype, which my application sends audio to.
After about 10 seconds the delay between my talking and hearing my voice come out of Skype on the second PC that I was using for testing, the far end of the call, started to grow. At around 20 - 30 seconds the delay started to grow exponentially and at that point triggered code I have that checks to see if a critical section was being held too long.
Fortunately it wasn't too late at night and having been through this before, I decided to stop relentlessly tracing and turned off the majority of the OutputDebugString(). Thankfully I had most of them wrapped in a conditional compiler define so it was easy to do. The instant I did this the problems went away, and it turned out my code was working fine.
So it looks like the Delphi 6 IDE starts to really bog down when the amount of OutputDebugstring() traffic is above some threshold. Perhaps it's just the task of adding strings to the Event Log debugger pane, which holds all the OutputDebugString() reports. I don't know, but I have seen similar problems in my applications when a TMemo or similar control starts to contain too many strings.
What have those of you out there done to prevent this? Is there a way of clearing the Event Log via some method call or at least a way of limiting its size? Also, what techniques do you use via conditional defines, IDE plug-ins, or whatever, to cope with this situation?
A similar problem happened to me before with Delphi 2007. Disable event viewing in the IDE and instead use DebugView from Sysinternals.
I hardly ever use OutputDebugString. I find it hard to analyze the output in the IDE and it takes extra effort to keep several sets of multiple runs.
I really prefer a good logging component suite (CodeSite, SmartInspect) and usually log to various files. Standard files for example are "General", "Debug" (standard debug info that I want to collect from a client installation as well), "Configuration", "Services", "Clients". These are all set up to "overflow" to a set of numbered files, which allows you to keep the logs of several runs by simply allowing more numbered files. Comparing log info from different runs becomes a whole lot easier that way.
In the situation you describe I would add debug statements that log to a separate logfile. For example "Trace". The code to make "Trace" available is between conditional defines. That makes turning it on pretty simple.
To avoid leaving in these extra debug statements, I tend to make the changes to turn on the "Trace" log without checking it out from source control. That way, the compiler of the build server will throw out "identifier not defined" errors on any statements unintentionally left in. If I want to keep these extra statements I either change them to go to the "Debug" log, or put them between conditional defines.
The first thing I would do is make certain that the problem is what you think it is. It has been a long time since I've used Delphi, so I'm not sure about the IDE limitations, but I'm a bit skeptical that the event log will start bogging down exponentially over time with the same number of debug strings being written in a period of 20-30 seconds. It seems more likely that the number of debug strings being written is increasing over time for some reason, which could indicate a bug in your application control flow that is just not as obvious with the logging disabled.
To be sure I would try writing a simple application that just runs in a loop writing out debug strings in chunks of 100 or so, and start recording the time it takes for each chunk, and see if the time starts to increase as significantly over a 20-30 second timespan.
If you do verify that this is the problem - or even if it's not - then I would recommend using some type of logging library instead. OutputDebugString really loses it's effectiveness when you use it for massive log dumps like that. Even if you do find a way to reset or limit the output window, you'd be losing all of that logging data.
IDE Fix Pack has an optimisation to improve performance of OutputDebugString
The IDE’s Debug Log View also got an optimization. The debugger now
updates the Log View only when the IDE is idle. This allows the IDE to
stay responsive when hundreds of OutputDebugString messages or other
debug messages are written to the Debug Log View.
Note that this only runs on Delphi 2007 and above.

Resources