Parsing search results from website, compojure/clojure - parsing

For some time I'm working on a simple clojure project for movies, so I'm trying to parse search results from a particular web site, in my case imdb.com. Not sure If I'm on the right track for this so I'm hoping someone would help me out.
Homepage will look simple enough, with text-field where you would enter movie name and submit button named "Search". I'll try to be as much deatailed as possible:
1.This is the main page:
(defn view-input []
(view-layout
[:h2 "Find your Movie"]
[:body {:style "font: 14pt/16pt helvetica; background-color: #F2FB78; padding-top:100px; text-align: center" }
(form-to [:post "/"]
[:br]
[:br]
(text-field {:placeholder "Enter movie name" } :a) [:br]
(submit-button "Search")
)]
))
2.These are the functions that I've been using:
(defn create-flick-url [a]
(str "http://www.imdb.com/search/title?title=" a "&title_type=feature"
))
(defn flick-vec [categories a]
(vec (let [flick-url (create-flick-url a)
flick-names (print-flick-name-content flick-url)]
(mapper-gen4 flick-names
(get-image-content flick-url)
))) )
(defn view-output2 [categories a]
(view-layout
[:h2 "Search results"]
[:form {:method "post" :action "/"}
(interleave
(for [flick (flick-vec categories a)]
(label :title (:name flick)))
(for [flick-name (flick-vec categories a)]
[:br])
(for [flick-image (flick-vec categories a)]
[:img {:src (:image flick-image)}])
(for [flick (flick-vec categories a)]
[:br]))
]))
3.And this is the GET/POST part in the same class, where I'm using the view-output and view-output2 functions :
(defroutes main-routes
(GET "/" []
(view-input))
(POST "/" [categories a]
(view-output2 categories a))
4.Also, these are the functions that previous ones are using:
(defn print-flick-name-content
[url]
(vec (flatten (map :content (h3+table url)))))
(defn get-image-content
[url]
(vec (flatten (map #(re-find #"http.*jpg" %)
(map :style (map :attrs (h3+table2 url)))))))
(defn get-page
"Gets the html page from passed url"
[url]
(html/html-resource (java.net.URL. url)))
(defn h3+table
"Return seq of <h3> and table tags, where content of the <h3> tag meet defined condition"
[url]
(html/select (get-page url)
[:td (html/attr= :class "title") :h3 :a]))
(defn h3+table2
"Return seq of <h3> and table tags, where content of the <h3> tag meet defined condition"
[url]
(html/select (get-page url)
[:td (html/attr= :class "image")]))
5.And here's the last one, function defined in another class which deals with map:
(defn mapper-gen4
[names images] (sort-by :name (map #(hash-map
:name %1 :image %2) names images)))
I know it's a bit much, but this way someone will see where the problem is, so far the Search result page shows no results, nor errors, only blank page with h2 Search Results title. Thanks in advance!

I would start at the form handling route:
(POST "/" [categories a]
(view-output2 categories a))
inserting a humble print statement:
(POST "/" [categories a]
(do
(println "CAT" categories "A" a)
(view-output2 categories a)))
Make sure your handler includes wrap-reload so you can refresh the page and check the console. You might see that categories and a are nil, in which case you might next try something like this:
(POST "/" req
(do
(println "REQ" req)
(view-output2 *hard-coded-categories* *hard-coded-a*)))
Replace hard-coded-categories and hard-coded-a with the data structure you are expecting to see. This will test:
You will see in the request where the parameters are.
You will see whether your rendering code does what you expect with the right data.
If indeed the problem is that categories and a are nil, it might just be that you forgot a middleware handler (see why this matters).
If they contain the data you expect in the structure you expect, then it is time to drill down into your other functions. For this I recommend using a REPL session and calling your top level function with the hard-coded values you are expecting from the form, however if you are using wrap-reload you could also just resubmit the form. For example you could add printouts of the inputs and outputs of your mapper-gen4 function and get-page function.
Finally after playing with some values and results, copy these from your REPL into a test file so that you have some permanent assertions about how your code behaves.
If all else fails, posting a link to your github project will get you better help - or if it is private, create a minimal example project so that people can help you more precisely.

Related

Ruby - link_to - How to add data directly from DB

First of all, I am very new to ruby and I am trying to maintain an application already running in production.
I have been so far able to "interpret" the code well, but there is one thing I am stuck at.
I have a haml.html file where I am trying to display links from DB.
Imagine a DB structure like below
link_name - Home
URL - /home.html
class - clear
id - homeId
I display a link on the page as below
< a href="/home.html" class="clear" id="home" > Home </a>
To do this I use 'link_to' where I am adding code as follows
-link_to model.link_name , model.url, {:class => model.class ...... }
Now I have a new requirement where we have a free text in DB, something like -
data-help="home-help" data-redirect="home-redirect" which needs to come into the options.
So code in haml needs to directly display content versus assign it to a variable to display.
In other words I am able to do
attr= '"data-help="home-help" data-redirect="home-redirect"' inside the <a>, but not able to do
data-help="home-help" data-redirect="home-redirect" in <a> tag.
Any help would be greatly appreciated!
link_to accepts a hash :data => { :foo => "bar" } of key/val pairs that it will build into data- attributes on the anchor tag. The above will create an attr as follows data-foo="bar"
So you could write a method on the model to grab self.data_fields (or whatever it's called) and split it into attr pairs and then create a hash from that. Then you can just pass the hash directly to the :data param in link_to by :data => model.custom_data_fields_hash
This somewhat verbose method splits things out and returns a hash that'd contain: {"help"=>"home-help", "redirect"=>"home-redirect"}
def custom_data_fields_hash
# this would be replaced by self.your_models_attr
data_fields = 'data-help="home-help" data-redirect="home-redirect"'
# split the full string by spaces into attr pairs
field_pairs = data_fields.split " "
results = {}
field_pairs.each do |field_pair|
# split the attr and value by the =
data_attr, data_value = field_pair.split "="
# remove the 'data-' substring because the link_to will add that in automatically for :data fields
data_attr.gsub! "data-", ""
# Strip the quotes, the helper will add those
data_value.gsub! '"', ""
# add the processed pair to the results
results[data_attr] = data_value
end
results
end
Running this in a Rails console gives:
2.1.2 :065 > helper.link_to "Some Link", "http://foo.com/", :data => custom_data_fields_hash
=> "<a data-help=\"home-help\" data-redirect=\"home-redirect\" href=\"http://foo.com/\">Some Link</a>"
Alternatively you could make it a helper and just pass in the model.data_attr instead
link_to "Some Link", "http://foo.com/", :data => custom_data_fields_hash(model.data_fields_attr)
Not sure you can directly embed an attribute string. You could try to decode the string in order to pass it to link_to:
- link_to model.link_name, model.url,
{
:class => model.class
}.merge(Hash[
str.scan(/([\w-]+)="([^"]*)"/)
])
)

Add param to an existing URL in Rails 4 link_to

In my rails project I am in this URL, result of a search form:
http://localhost:3000/buscar?localidad=&page=2&&query=de
And I have an (a..z) list to order alphabetically the results. When I press the 'A' button I want to add a parameter to my previous URL, so it'd be like this:
http://localhost:3000/buscar?localidad=&page=2&&query=de&start=A
But in my 'A' button y have this link_to:
= link_to 'A', search_index_path(start: 'A')
so the URL is just:
http://localhost:3000/buscar?start=A
, removing all the previous params I had... :(
How can I do to 'concatenate' the params to the previous params I already had in the URL?
Thanks!
You can do:
= link_to 'A', params.merge(start: 'A')

Is it a security risk to allow the following HTML elements (e.g. code, pre)?

I'm using the following plugin: https://github.com/jhollingworth/bootstrap-wysihtml5/
This is how I'm sanitizing my input/outputs in my Rails app:
post.rb:
protected
def clean_input
self.content = sanitize(self.content, :tags => %w(b i u br p span blockquote pre code), :attributes => %w(id class style))
end
posts/show.html.rb:
<p><%= sanitize #post.content, :tags => %w(b i u p span br blockquote pre code), :attributes => %w(id class style) %></p>
This parser rules for wysihtml5 (of course, the editor is allowing tags like b, i, etc. as default):
shared/editor_toolbar:
parserRules: {
classes: {
"ruby": 1,
"variable": 1,
"string": 1
},
tags: {
span: {},
code: {},
pre: {}
}
},
So, right now the user can input and the app can output something like this:
<pre class="ruby">
<code>
<span class="variable">
$(</span><span class="string">'.wysihtml5'</span>).wysihtml5({<span class=
"string">'stylesheets'</span>: false});
</code>
</pre>
(The user can switch from visual and html view)
I hope this is not a stupid question (I'm not very familiar with security), but is this relatively safe or dangerous? If so, how to prevent it?
I really don't know about Ruby, but in PHP you can allow tags like that and from what I've experimented, it's NOT secure at all... The reason why is because attributes on these authorized tags are not sanitised so any user could input a very gentle and inoffensive <span></span> tag but adding this to it :
<span onmouseover="hack_the_whole_fucking_website();">contenthere</span>
This way, the JavaScript will be executed when a user move his mouse over it ! From there I guess an hacked could steal user's cookies + steal Session cookie + hijack users Sessions + maybe hijack an admin session and then explode your website. It's an open door for hackers.
The solution I use for this is BBcode tags. They are kind of "substitutes" for existing HTML tags. Some examples :
<i> = [i]
<img src="#"> = [img=#]
text = [url=#]text[/url]
...
The output of the editor should be in this format so you can run a sanitizing script that properly delete all real HTML tags. And then when it's time to output this data to the user, you replace these replacement tags by the real HTML tag using some regular expressions. :)

saving collection to database using simple_form in rails

I am using simple form to make checkboxes as below:
module ApplicationHelper
def all_colors
t = [
"A",
"B",
"C",
"D"
]
return t
end
<%= f.input :colors, :label=>false, :collection => all_colors, :as => :check_boxes%>
When the user selects some checkboxes and submits the form, the data saved in databases also has non selected items empty location.
For example If user checkmarked B and D then data saved in DB will look like:
---
- ""
- B
- ""
- D
I want the data to be saved as CSV
At a glance it seems to me you might be passing extra commas in your input so when your user checkmarked B and D what got passed back in the request was- ,B,D,.
When you parse the input to save the extra commas were translated into empty values.
Check the request variables that were posted back by the form submit and strip away the extra commas.

convert array of parameters from form into string

I have a form with a checkboxes:
-form_tag filter_path(#page.permalink), :method => 'get' do |f|
-ftype.producers.each do |producer|
=check_box_tag "producers[]", producer.id, false
=label_tag producer.title
%br
=submit_tag 'Сортувати', :name => nil
When I send a request, it sends a hash params with an array of producers.Link then looks like that:
'/pages/:page_id/filter?producers[]=4&producers[]=5'
And I want to make it look that:
'/pages/:pages_id/filter?producers=4,5'
Please help
It shouldn't be a problem, since ?producers[]=4&producers[]=5 will be converted by the framework into params[:producers] array with value [4, 5].
Thus, you already have an array and you don't even have to parse anything.
But if your really want to submit two input values in one parameter, you'd have to employ some javascript. By default, if you have in html form two inputs with the same name, two independent values will be submitted (like in sample url you provided).
So, it's not a Rails question, it's html and javascript question.

Resources