Filter tweet keywords - twitter

I'm trying to filer certain words with Twython before retweeting. I can't figure out a way to get it to work and instead of filtering out certain words, it's adding those words to the ones to retweet. Here is my code:
naughty_words = ["",'"Sign up"', "kindle", "read", "book", "amzn", "amazon"]
good_words = ["Giveaway", ""]
filter = "OR".join(good_words)
blacklist = "-".join(naughty_words)
keywords = filter + blacklist
search_results = twitter.search(q="keywords", count= 5)
try:
for tweet in search_results["statuses"]:
twitter.retweet(id = tweet["id_str"])
time.sleep(15)
except TwythonError as e:
print e

Two issues that I see, fix those and see if it fixes your problem.
1) keywords isn't functioning as expected. From your code now I get GiveawaySign up -kindle -read -book -amzn -amazon. This is because good_words is a 1 element list, so the .join isn't working as expected.
2) The way "Sign Up" is done will show up as "Sign" AND "up" that's more likely the problem.
Try the following:
naughty_words = ["",'"Sign up"', "kindle", "read", "book", "amzn", "amazon"]
good_words = ["Giveaway", ""]
Also, remove the space after OR and keep the one before.
Edit
Change your filter and blacklist to:
filter = "".join(good_words)
blacklist = " -".join(naughty_words)
Since you only have one word in good_words there's not need for the OR. You should get:
Giveaway -"Sign up" -kindle -read -book -amzn -amazon

Related

Using Insert with a large multi-layered table using Lua

So I am working on a script for GTA5 and I need to transfer data over to a js script. However so I don't need to send multiple arrays to js I require a table, the template for the table should appear as below.
The issue I'm having at the moment is in the second section where I receive all vehicles and loop through each to add it to said 'vehicleTable'. I haven't been able to find the "table.insert" method used in a multilayered table
So far I've tried the following
table.insert(vehicleTable,vehicleTable[class][i][vehicleName])
This seems to store an 'object'(table)? so it does not show up when called in the latter for loop
Next,
vehicleTable = vehicleTable + vehicleTable[class][i][vehicleName]
This seemed like it was going nowhere as I either got a error or nothing happened.
Next,
table.insert(vehicleTable,class)
table.insert(vehicleTable[class],i)
table.insert(vehicleTable[class][i],vehicleName)
This one failed on the second line, I'm unsure why however it didn't even reach the next problem I saw later which would be the fact that line 3 had no way to specify the "Name" field.
Lastly the current one,
local test = {[class] = {[i]={["Name"]=vehicleName}}}
table.insert(vehicleTable,test)
It works without errors but ultimately it doesn't file it in the table instead it seems to create its own branch so object within the object.
And after about 3 hours of zero progress on this topic I turn to the stack overflow for assistance.
local vehicleTable = {
["Sports"] = {
[1] = {["Name"] = "ASS", ["Hash"] = "Asshole2"},
[2] = {["Name"] = "ASS2", ["Hash"] = "Asshole1"}
},
["Muscle"] = {
[1] = {["Name"] = "Sedi", ["Hash"] = "Sedina5"}
},
["Compacts"] = {
[1] = {["Name"] = "MuscleCar", ["Hash"] = "MCar2"}
},
["Sedan"] = {
[1] = {["Name"] = "Blowthing", ["Hash"] = "Blowthing887"}
}
}
local vehicles = GetAllVehicleModels();
for i=1, #vehicles do
local class = vehicleClasses[GetVehicleClassFromName(vehicles[i])]
local vehicleName = GetLabelText(GetDisplayNameFromVehicleModel(vehicles[i]))
print(vehicles[i].. " " .. class .. " " .. vehicleName)
local test = {[class] = {[i]={["Name"]=vehicleName}}}
table.insert(vehicleTable,test)
end
for k in pairs(vehicleTable) do
print(k)
-- for v in pairs(vehicleTable[k]) do
-- print(v .. " " .. #vehicleTable[k])
-- end
end
If there is not way to add to a library / table how would I go about sorting all this without needing to send a million (hash, name, etc...) requests to js?
Any recommendations or support would be much appreciated.
Aside the fact that you do not provide the definition of multiple functions and tables used in your code that would be necessary to provide a complete answere without making assumptions there are many misconceptions regarding very basic topics in Lua.
The most prominent is that you don't know how to use table.insert and what it can do. It will insert (append by default) a numeric field to a table. Given that you have non-numeric keys in your vehicleTable this doesn't make too much sense.
You also don't know how to use the + operator and that it does not make any sense to add a table and a string.
Most of your code seems to be the result of guess work and trial and error.
Instead of referring to the Lua manual so you know how to use table.insert and how to index tables properly you spend 3 hours trying all kinds of variations of your incorrect code.
Assuming a vehicle model is a table like {["Name"] = "MyCar", ["Hash"] = "MyCarHash"} you can add it to a vehicle class like so:
table.insert(vehicleTable["Sedan"], {["Name"] = "MyCar", ["Hash"] = "MyCarHash"})
This makes sense because vehicleTable.Sedan has numeric indices. And after that line it would contain 2 cars.
Read the manual. Then revisit your code and fix your errors.

Can't Identify Proper CSS Selector to Scrape with Mechanize

I have built a web scraper that is successfully pulling almost everything I need out of the web page I'm looking at. The goal is to pull the URL for a particular image associated with all the coffees found at a particular URL.
The rake task I have defined to complete the scraping is as follows:
mechanize = Mechanize.new
mechanize.get(url) do |page|
page.links_with(:href => /products/).each do |link|
coffee_page = link.click
bean = Bean.new
bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
bean.roaster_id = "2"
bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')
if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
bean.destroy
else
ap bean
end
end
end
Now the information I need is all on the page, and I'm looking for the image URL that is found like the below, but for all the individual coffee_pages at the source page. It needs to be generic enough to pull this picture source but nothing else. I've tried a number of different css selectors but everything pulls either nil or blank.
<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">
The coffee_page I'm on is here: https://shop.ceremonycoffee.com/products/burundi-kiryama
You need to change
bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')
to
bean.image_url = coffee_page.css('#mobile-only>img').attr('src')
If you can, always use nearby identifiers to locate the element you want to access.

undefined method `click' for "2":String, Rails error when using Mechanize

class ScraperController < ApplicationController
def show
mechanize = Mechanize.new
website = mechanize.get('https://website.com/')
$max = 2
$counter = 0
$link_to_click = 2
#names = []
while $counter <= $max do
#names.push(website.css('.memName').text.strip)
website.link_with(:text => '2').text.strip.click
$link_to_click += 1
$counter += 1
end
end
end
I am trying to scrape 20 items off of each page and then click on the link at the bottom (1, 2, 3, 4, 5, etc.). However, I get the error as seen in the title which tells me that I cannot click the string. So it recognizes that the button '2' exists but will tell me if cannot click it. Ideally, once this is sorted out, I wanted to the use the $link_to_click variable as a way to replace the '2' so that it will increment each time but it always comes back as nil. I have also changed it to .to_s with the same result.
If I remove the click all together, it will scrape the same page 3 times instead of moving onto the next page. I have also removed the text.strip part before the .click and it will do the same thing. I have tried many variations but have had no luck.
I would really appreciate any advice you could offer.
I ended up reviewing the articles I was referencing to solve this and came to this conclusion.
I changed the website_link to website = website.link_with(:text => $link_to_click.to_s).click (because it only worked as a string) and it printed out the first page, second and each one thereafter.
These are the articles that I was referencing to learn how to do this.
http://docs.seattlerb.org/mechanize/GUIDE_rdoc.html
and
https://readysteadycode.com/howto-scrape-websites-with-ruby-and-mechanize

Extract params values from link, Rails 4

I have link like this:
http://localhost:3000/sms/receive/sms-id=7bb28e244189f2cf36cbebb9d1d4d02001da53ab&operator-%20id=1&from=37126300682&to=371144&text=RV9+c+Dace+Reituma+0580913
I want to extract all diferent variable values from this link. For example sms-id,operator,from, to and text.
So far I have like this:
routes.rb
get 'sms/receive/:params', to: 'sms#receive'
SMS#RECEIVE controller
def receive
query = params[:params]
sms_id= query[/["="].+?[&]/]
flash[:notice] = sms_id
end
This gives me : =7bb28e244189f2cf36cbebb9d1d4d02001da53ab& but I need without the first = and last characher & .
If I try to add strings like :query[/["sms-id"].+?[&operator]/] that could allow me to extract all variables smoothly, it gives me error : empty range in char class: /["sms-id"].+?[&operator]/
But I believe there is other way to extract all these variable values in different way?
Thanks in advance!
The error in your regular expression is because the - is a reserved character when in-between square brackets. In this context, it must be escaped with a backslash: \-.
To parse your query string, you can do this:
sms_id = params[:params].match(/sms-id=([^&]*)/)[1]
or parse it with the more generic method:
parsed_query = Rack::Utils.parse_nested_query(params[:params])
sms_id = parsed_query['sms-id']
(quoted from this answer)
If you have control over the initial URL, change the last / for a ? for an even easier solution:
http://localhost:3000/sms/receive?sms-id=7bb28e244189f2cf36cbebb9d1d4d02001da53ab&operator-%20id=1&from=37126300682&to=371144&text=RV9+c+Dace+Reituma+0580913
and you will have sms-id in params:
sms_id = params['sms-id']
You need
get 'sms/receive/', to: 'sms#receive'
path in routes.rb and get params in the controller
Try this
matches = params[:params].scan(/(?:=)([\w\+]+)(?:\&)?/)
# this will make matches = [[first_match], [second_match], ..., [nth_match]]
# now you can read all matches
sms_id = matches[0][0]
operator_id = matches[1][0]
from = matches[2][0]
to = matches[3][0]
text = matches[4][0]
# and it will not contatin = or &
I suggest for you to make method in model or helper, and not to write whole code in controller.

Ruby, gsub and regex

Quick background: I have a string which contains references to other pages. The pages are linked to using the format: "#12". A hash followed by the ID of the page.
Say I have the following string:
str = 'This string links to the pages #12 and #125'
I already know the IDs of the pages that need linking:
page_ids = str.scan(/#(\d*)/).flatten
=> [12, 125]
How can I loop through the page ids and link the #12 and #125 to their respective pages? The problem I've run into is if I do the following (in rails):
page_ids.each do |id|
str = str.gsub(/##{id}/, link_to("##{id}", page_path(id))
end
This works fine for #12 but it links the "12" part of #125 to the page with ID of 12.
Any help would be awesome.
Instead of extracting the ids first and then replacing them, you can simply find and replace them in one go:
str = str.gsub(/#(\d*)/) { link_to("##{$1}", page_path($1)) }
Even if you can't leave out the extraction step because you need the ids somewhere else as well, this should be much faster, since it doesn't have to go through the entire string for each id.
PS: If str isn't referred to from anywhere else, you can use str.gsub! instead of str = str.gsub
if your indexes always end at word boundaries, you can match that:
page_ids.each do |id|
str = str.gsub(/##{id}\b/, link_to("##{id}", page_path(id))
end
you only need to add the word boundary symbol \b on the search pattern, it is not necessary for the replacement pattern.

Resources