Is there an easy way to collect click stream data from a rails application?
(for example by activating a gem)
I searched rubygems but couldn't find anything.
You are more likely to find a solution in Javascript, as it is a server side technology. For example, a service like clickstreamr only requires you to copy/paste some javascript code and it will record your click stream data
Related
I am trying to develop a Crawler to crawl youtube.com and parse the meta information(title, description, publisher etc) and store these into Hbase/other storage systems. I understood that I have to write plugin(s) to achieve this. But I'm confused what plugins I need to write for this. I am inspecting with this four -
Parser
ParserFilter
Indexer
IndexFilter
To parse the specific metadata information for youtube page, do I need to write a custom Parser plugin or ParseFilter plugin along with using parse-html plugin?
After parsing, to store the entry in Hbase/other storage system do I require to write a IndexWriter plugin? By indexing, we generally understand indexing in Solr, ElasticSearch etc. But I don't need to index in any search engine obviously. So, how can I store them in some store say Hbase after parsing?
Thanks in advance!
Since youtube is a web page, you'll need to write an HtmlParseFilter which gives you access to the raw HTML fetched from the server, but at the moment youtube a LOT of javascript and neither parse-html or parse-tika support executing the js code, so I'll advice you to use the protocol-selenium plugin so you'll delegate the rendering of a webpage to the selenium driver and get the HTML back (after all the JS has been executed). After you write your own HtmlParseFilter you'll need to write your own IndexingFilter, in this case you'll only need to specify what info you want to send to your backend, this is totally backend-agnostic and relies only on the Nutch codebase (that's why you'll need your own IndexWriter).
I assume that you're using Nutch 1.x, in this case yes you need to write a custom IndexWriter for your backend (which is fairly easy). If you use Nutch 2.x you'll have access to several backends through Apache Gora but then you'll have some features missing (like protocol-selenium).
I think you should use something like Crawler4j for your purposes.
The real power of Nutch is utilized when you want to do a much wider search or you want to index your data directly into Solr/ES. But since you just want to download data for each URL, I would totally go with Crawler4j. It's much easier to setup and does not require complex configurations.
I have an web-application in RoR which calculates some energy values and investment money. I use ajax to send the data from the web-browser to the server. It is something like this: Browser-server-Browser-Server-Browser
This web-application is already integrated in typo3 and I want to implement a PDF button to send the results per email (in other words, a photo of the page with the results).
I have heard an option would be to generate some links in RoR to be used in typo3 (when clicking on it, it would open exactly the web-application with the results already calculated). But as a newbie, I do not really know which would be the best approach.
Any recommendation?
A screenshot of the page can be done client-side:
http://html2canvas.hertzen.com/
You could even have another page with the same results that you use only for the rendering of the result page what you use for making a clean screen-shot (you might not want to have the footer, menu and other elements on that page, only the results)
Once you have your screenshot, you can upload it to your server where you can use it to create a PDF of that image and then send it with any mail API you prefer to use.
info about TYPO3's mail api can be found here:
https://docs.typo3.org/typo3cms/CoreApiReference/ApiOverview/Mail/Index.html
I am using chart-kick to generate graphs in my Ruby on Rails application. I am able to display the charts without any problem. The issue is I am in need to refresh the page which has the graph each time the data is added to see the updated graph. How do I make sure that the graph alone reloads everytime I update new data in my database? Is that possible?
Note: Highcharts is not an option. I am giving this to a commercial website and they can't afford Highcharts.
I ended up using Chart.js and using Ajax requests to poll the database and update the graph for its changes. I was not successfull in implementing Websockets.
To achieve what you want, you need a bidirectional communication between your server and your client.
I will suggest you to use websockets, it suffers from not being supported by all browsers but as fallback you can use polling the server for new results (if you care enough for a fallback). To check the current support status check this page.
Follow this example on websocket-rails gem to get started.
I would use something like pusher its for doing exactly what you need.
Use turbolinks for reload graph
Im trying to send some data to a form on a site were im a member using cURL, but when i look at the headers being sent, they seem to have been encrypted.
Is there a way i can get around this by making the computer / server visit the site and actual add the data to the inputs on the form and then hit submit, so that it would generate the correct data and post the form ?
You have got a few options:
reverse engineer the JavaScript that does the encryption (or possibly just encoding) process
get a browser engine (e.g. the Gecko engine), and add some scripting to it to fill in the forms and push the submit button - of course you would need JavaScript support within the page itself
parse the HTML using an HTML parser, feed the JavaScript in it to a JavaScript runtime with the correct libraries, fill in the "form" and hit the submit button
It's probably easiest to go for the first option. The JavaScript must be in the open to be able to be executed in the browser. But it may take some time to reverse-engineer as it is likely obfuscated.
You can use a framework to automate user interaction on the web pages, like Selenium.
This would enable you to not bother reverse engineering anything.
Selenium has binding in various languages, including Python and java.
Provided the javascript is visible on the website in question, you should be able to simply copy and paste their encryption routines to prepare the headers exactly as they do
A hacky fix if you can isolate the function that encodes the data you type in the form - is to use something like PyV8 to execute the JS inside python.
Use AutoHotKeyIt and actually have it use the Browser Normally. It can read from files, and do repetitive tasks infinitely. Also you can push a flag to make it only happen within that application, which means you can have it minimized and yet still preform the action.
You seem to be having issues with the problem of them encrypting the headers and such, so why not simply use that too your advantage? Your still pushing the same data in, but now your working around their system. With little to no side effect too you.
Is there any way to do offline syncing with a rails project?
In other words, our client is using their site to show a photo gallery, but they need to be able to do it without an active internet connection. At any time, they can get back online - and download any new data - to be able to continue showing their gallery?
Thanks!
You will have to make a javascript client application, that stores changes and state inside the HTML5 local storage. So for the user he can do actions, which can be saved/synced later to the server (e.g. when he is connected to the internet again).
Sproutcore would be ideal for this. I am not sure if any of the up and coming javascript libraries (Backbone.js, spine.js) interact with local storage.
Hope this helps.
I have decided to use rack-offline, but no one answered it.
It can be found at github if anyone is interested.