I'm trying to mimic what a previous developer did to parse an XML file in my Rails app and am stuck. From what I can tell, my job completes, but nothing is being posted as it should be, so I'm guessing my parsing file is incorrect (however, it works fine when testing with the raw file on my localhost). So, where am I going wrong here?
This is Sidekiq log output, just to confirm job is happening and not showing any errors in processing:
2016-05-25T13:51:04.499Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: start
2016-05-25T13:51:04.781Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: start
2016-05-25T13:51:04.797Z 8977 TID-oxs3s9lng ParseTestData JID-2a01971539c887cac3bf3374:1 INFO: done: 0.297 sec
2016-05-25T13:51:04.824Z 8977 TID-oxs3s9l3g GenerateNotifications JID-2a01971539c887cac3bf3374:2 INFO: done: 0.043 sec
This is my Sidekiq job file, which iterates through the compressed files that get submitted through my API. The file in question that I'm working on is nmap_poodle_scan.xml:
class ParseTestData
include Sidekiq::Worker
# Order matters. Parse network hosts first to ensure we uniquely identify network hosts by their mac address.
PARSERS = {
"network_hosts.xml" => Parsers::NetworkHostParser,
"nmap_tcp_service_scan.xml" => Parsers::TcpServiceScanParser,
"nmap_shellshock_scan.xml" => Parsers::ShellshockScanParser,
"hydra.out" => Parsers::HydraParser,
"events.log" => Parsers::EventParser,
"nmap_poodle_scan.xml" => Parsers::PoodleScanParser
}
def perform(test_id)
test = Test.find(test_id)
gzip = if Rails.env.development?
Zlib::GzipReader.open(test.data.path)
else
file = Net::HTTP.get(URI.parse(test.data.url))
Zlib::GzipReader.new(StringIO.new(file))
end
# Collect entries from tarball
entries = {}
tar_extract = Gem::Package::TarReader.new(gzip)
tar_extract.rewind
tar_extract.each do |entry|
entries[File.basename(entry.full_name)] = entry.read
end
# Preserve parse order by using the parser hash to initiate parser executions.
PARSERS.each_pair do |filename, parser|
next unless entry = entries[filename]
parser.run!(test, entry)
end
end
end
Which grabs nmap_poodle_scan.xml:
<host starttime="1464180941" endtime="1464180941"><status state="up" reason="arp-response" reason_ttl="0"/>
<address addr="10.10.10.1" addrtype="ipv4"/>
<address addr="4C:E6:76:3F:2F:77" addrtype="mac" vendor="Buffalo.inc"/>
<hostnames>
<hostname name="DD-WRT" type="PTR"/>
</hostnames>
Nmap scan report for DD-WRT (10.10.10.1)
<ports><extraports state="closed" count="996">
<extrareasons reason="resets" count="996"/>
</extraports>
<table key="CVE-2014-3566">
<elem key="title">SSL POODLE information leak</elem>
<elem key="state">VULNERABLE</elem>
<table key="ids">
<elem>OSVDB:113251</elem>
<elem>CVE:CVE-2014-3566</elem>
</table>
<table key="description">
<elem> The SSL protocol 3.0, as used in OpenSSL through 1.0.1i and
other products, uses nondeterministic CBC padding, which makes it easier
for man-in-the-middle attackers to obtain cleartext data via a
padding-oracle attack, aka the "POODLE" issue.</elem>
</table>
<table key="dates">
<table key="disclosure">
<elem key="year">2014</elem>
<elem key="month">10</elem>
<elem key="day">14</elem>
</table>
</table>
<elem key="disclosure">2014-10-14</elem>
<table key="check_results">
<elem>TLS_RSA_WITH_3DES_EDE_CBC_SHA</elem>
</table>
<table key="refs">
<elem>https://www.imperialviolet.org/2014/10/14/poodle.html</elem>
<elem>http://osvdb.org/113251</elem>
<elem>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3566</elem>
<elem>https://www.openssl.org/~bodo/ssl-poodle.pdf</elem>
</table>
</table>
</script></port>
</ports>
<times srtt="4665" rttvar="556" to="100000"/>
</host>
Which should submit to PoodleScanParser:
module Parsers
class PoodleScanParser < NmapScanParser
def self.run!(test, content)
super(test, content, "//host//ports[.//elem[#key='state'][contains(text(), 'VULNERABLE')]]") do |host, network_host_test|
logger.info "Something cool"
IssueFinder.match(cve_id: "CVE-2014-3566").each do |issue|
Result.generate!(network_host_test.id, issue.id)
end
end
end
end
end
Which inherits from NmapScanParser. This file is parser is confirmed to work fine, so I know it's not the issue:
module Parsers
class NmapScanParser
def self.run!(test, content, xpath)
document = Nokogiri::XML(content)
document.remove_namespaces!
document.xpath(xpath).each do |host|
ip_address = host.at_xpath("address[#addrtype='ipv4']").at_xpath("#addr").value
vendor = host.at_xpath("address[#addrtype='mac']").at_xpath("#vendor").value rescue "Unknown"
hostname = host.at_xpath("hostnames/hostname").at_xpath("#name").value rescue "Hostname Unknown"
os = host.at_xpath("os/osmatch").at_xpath("#name").value rescue "Unknown"
os_vendor = host.at_xpath("os/osmatch/osclass").at_xpath("#vendor").value rescue "Unknown"
network_host_test = NetworkHostTest.generate!(test, ip_address: ip_address, hostname: hostname, vendor: vendor, os: os, os_vendor: os_vendor)
# If we didn't find a network host, that's because our network_hosts file didn't have this entry.
next unless network_host_test
yield(host, network_host_test)
end
end
end
end
I've confirmed the parser works on my localhost with the same raw output as above using a plain ruby file, and running ruby poodle_parser.rb:
require 'nokogiri'
document = Nokogiri::XML(File.open("poodle_results.xml"))
document.remove_namespaces!
document.xpath("//host[.//elem/#key='state']").each do |host|
ip_address = host.at_xpath("address[#addrtype='ipv4']").at_xpath("#addr").value
result = host.at_xpath("//ports//elem[#key='state']").content
puts "#{ip_address} #{result}"
end
Which outputs what I would expect in terminal:
10.10.10.1 VULNERABLE
So, in the end, I expect a Result to be generated, but it's not. I'm not seeing any errors in the Rails log on my localhost, nor am I seeing anything indicating an error in the Sidekiq logs either!
I decided to add a logger.info line to my PoodleScanParser to see if the Parser is even running as it should be. Assuming I did this correctly, the Parser doesn't look like it's running.
Well, the answer has nothing to do with Sidekiq, instead it was the output, which Nokogiri was dying on. Turns out Nmap was adding a non-XML line at the beginning of the XML file "Starting Nmap 7.12". So, Nokogiri was simply dying there.
I guess moral of the story is to make sure your XML output is what you Nokogiri intends it to be!
Related
I have a rake task which loops over pages of card game database and checks for the cards in each deck. Until recently this was working fine (it's checked 34000 pages of 25 decks each no problem) but recently this has stopped working when I run the rake task and I get the error:
JSON::ParserError: 765: unexpected token at ''
In order to debug this I have tried running each line of the get request and json parse manually in the rails console and it works fine every time. Weirder still I have installed pry and it works every time I go through the json parse manually with pry (takes ages though).
Here is the rake task:
desc "Create Cards"
require 'net/http'
require 'json'
task :create_cards => :environment do
# Get the total number of pages of decks
uri = URI("https://www.keyforgegame.com/api/decks/")
response = Net::HTTP.get(URI(uri))
json = JSON.parse(response)
deck_count = json["count"]
# Set variables
page_number = 1
page_size = 25 # 25 is the max page size
page_limit = deck_count / 25
card_list = Card.where(is_maverick: false)
# Updates Card List (non-mavericks) - there are 740 cards so we stop when we have that many
# example uri: https://www.keyforgegame.com/api/decks/?page=1&page_size=30&search=&links=cards
puts "Updating Card List..."
until page_number > page_limit || Card.where(is_maverick: false).length == 740
uri = URI("https://www.keyforgegame.com/api/decks/?page=#{page_number}&page_size=#{page_size}&search=&links=cards")
response = Net::HTTP.get(URI(uri))
json = JSON.parse(response) # task errors here!
cards = json["_linked"]["cards"]
cards.each do |card|
unless Card.exists?(:card_id => card["id"])
Card.create({
card_id: card["id"],
amber: card["amber"],
card_number: card["card_number"],
card_text: card["card_text"],
card_title: card["card_title"],
card_type: card["card_type"],
expansion: card["expansion"],
flavor_text: card["flavor_text"],
front_image: card["front_image"],
house: card["house"],
is_maverick: card["is_maverick"],
power: card["power"],
rarity: card["rarity"],
traits: card["traits"],
})
end
end
puts "#{page_number}/#{page_limit} - Cards: #{Card.where(is_maverick: false).length}"
page_number = (page_number + 1)
end
end
The first json parse where it gets the total number of pages of decks works okay. It's the json parse in the until block that is failing (I've marked the line with a comment to that effect).
As I say, if I try this in the console it works fine and I can parse the json without error, literally copying and pasting the lines from the file into the rails console.
Since you're looping over an api, it's possible there are rate limits. Public APIs normally have per second rate limits. You could try adding a sleep to slow down your requests, not sure how many your making per second. I tested with a simple loop and looks like response returns an empty string if you hit the api too fast.
url='https://www.keyforgegame.com/api/decks/?page=1&page_size=30&search=&links=cards'
uri = URI(url)
i = 1
1000.times do
puts i.to_s
i += 1
response = Net::HTTP.get(URI(uri))
begin
j = JSON.parse(response)
rescue
puts response
#= ""
end
end
I played with this until the loop stopped returning empty string after the 3rd request and got it to work with sleep 5 inside each loop, so you can probably add as the first line inside your loop. But you should probably add error handling to your rake task in case you encounter any other API errors.
So for now you can probably just do this
until page_number > page_limit || Card.where(is_maverick: false).length == 740
sleep 5
# rest of your loop code, maybe add a rescue like I've shown
end
I am trying to open a image URL from Activestorage but ruby falls asleep and the last item in console log is:
Started GET "/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--bfec3dc0d0745e49d7daa9faac7abdeaa46418c3/f2392748480.png" for 127.0.0.1 at 2018-12-02 14:45:01 -0600
and when it ends the error is: Net::ReadTimeout
This doesn't happened the first time, but after that it happen each time i run the method.
My method:
def upload_photos
require 'open-uri'
page_api = Koala::Facebook::API.new(params[:access_token])
image_urls = params[:images].map do |image|
open("#{request.protocol}#{request.host_with_port}#{image[:url]}") do |f|
page_api.put_picture(f, f.content_type, { "caption" => image[:filename] }, params[:id])
end
end
render json: { image_urls: image_urls, album_id: params }
end
It works the first time but not after that.
UPDATE: this works fine with external URLs but not with activestorage
This is due to Ruby is busy opening the file and when tries to reach the URL is not able to respond to that petition because is opening the file, is like trying to watch your own backhead in a mirror.
I'm having a strange issue where when I check the File.size of a particular file in Rails console, it returns the correct size. However when I run the same code in a rake task, it returns 0. Here is the code in question (I've tidied it up a bit to help with readability):
def sum_close
daily_closed_tickets = Fst.sum_retrieve_closed_tickets
daily_closed_tickets.each do |ticket|
CSV.open("FILE_NAME_HERE", "w+", {force_quotes: false}) do |csv|
if (FileCopyReceipt.exists?(path: "#{ticket.attributes['TroubleTicketNumber']}_sum.txt"))
csv << ["GENERATE CSV WITH ATTRIBUTES HERE"]
files = Dir.glob("/var/www/html/harmonize/public/close/CLOSED_#{ticket.attributes['TroubleTicketNumber']}_sum.txt")
files.each do |f|
Rails.logger.info "File size (should return non-0): #{File.size(f)}" #returns 0, but not in Rails Console
Rails.logger.info "File size true or false, should be true: #{File.size(f) != 0}" #returns false, should return true
Rails.logger.info "Rails Environment: #{Rails.env}" #returns production
if(!FileCopyReceipt.exists?(path: f) && (File.size(f) != 0))
Rails.logger.info("SUM CLOSE, GOOD => FileUtils.cp_r occurred and FileCopyReceipt object created")
else
Rails.logger.info("SUM CLOSE, WARNING: => no data transfer occurred")
end
end
else
Rails.logger.info("SUM CLOSE => DID NOT make it into initial if ClosedDate.present? if block")
end
end
end
close_tickets.rake
task :close_tickets => :environment do
tickets = FstController.new
tickets.sum_close
tickets.dais_close
end
It is beyond me why this File.size comes back as 0 when this is run as a rake task. I thought it may be a environment issue, but that does not seem to be the case.
Any insight on the matter is appreciated.
The CSV.open block and everything being wrapped in there was causing issues. So I just made CSV generation it's own snippet instead of wrapping everything in there.
daily_closed_tickets.each do |ticket|
CSV.open("generate csv here.txt") do |csv|
#enter ticket.attributes here for the csv
end
#continue on with the rest of the code and File.size() works properly
end
I am getting seemingly random erorrs when uploading files to s3 from my app on heroku. I am using jquery-file-upload to upload pictures to a tmp/ directory in my bucket using the CORS method and this code.
def url
temp_url = AWS::S3::S3Object.url_for(
s3_key,
S3_CONFIG['bucket'],
use_ssl: true)
puts temp_url
temp_url
# temp_url.to_s.encode_signs
end
def delete_photo_from_s3
begin
photo = AWS::S3::S3Object.find(s3_key, S3_CONFIG['bucket'])
photo.delete
rescue Exception => e
Rails.logger.error e.message
end
end
private
def s3_key
parent_url = self[:uri]
# If the url is nil, there's no need to look in the bucket for it
return nil if parent_url.nil?
# This will give you the last part of the URL, the 'key' params you need
# but it's URL encoded, so you'll need to decode it
object_key = parent_url.split(/\//)
"#{object_key[3]}/#{object_key[4]}/#{object_key[5]}"
end
From there I am using carrierwave to upload and process these images. However, sometimes the uploads fail silently and I am getting 403 Forbidden errors in my s3 bucket. Not sure what is causing this.
From there, I am using Qu to process a background job to attach the image to carrierwave using the remote__url call. Here is my background task:
class PhotoUploader
def self.perform(finding_id, photo_id)
begin
finding = Finding.find(finding_id)
photo = Photo.find(photo_id)
upload = finding.uploads.build
# attached_picture = photo.temp_image_url || photo.url
upload.remote_attachment_url = photo.url
if upload.save!
Rails.logger.debug "#{Time.now}: Photo #{photo_id} saved to finding..."
photo.set(:delete_at => 1.hour.from_now) # UTC, same as GMT (Not local time!)
photos = Photo.where(:processing => true, :delete_at.lte => Time.now.utc) # Query for UTC time, same type as previous line (also not local time!)
finding.unset(:temp_image)
if photos
photos.each do |photo|
photo.destroy
Rails.logger.debug "Photo #{photo.id} - #{photo.uri} destroyed."
end
end
else
raise "Could not save to s3!"
end
rescue Exception => e
Rails.logger.debug "#{Time.now}: PH01 - Error processing photo #{photo_id}, trying again... :: #{e.message}"
retry
end
end
end
This works sometimes, but not always, which is really wierd.
I end up getting a bunch of these errors in my s3 logs:
fc96aee492e463ff67c0a9835c23c81a09c4c36a53cdf297094ded3a7d02c62f actionlog-development [02/Dec/2012:20:27:18 +0000] 71.205.197.214 - 625CEFB5DB7867A7 REST.GET.OBJECT tmp/4f75d2fb4e484f2ffd000001/apcm_photomix1_0022.jpg "GET /actionlog-development/tmp/4f75d2fb4e484f2ffd000001/apcm_photomix1_0022.jpg?AWSAccessKeyId=AKIAI___ZA6A&Expires=1354480332&Signature=4wPc+nT84WEdOuxS6+Ry4iMNkys= HTTP/1.1" 403 SignatureDoesNotMatch 895 - 8 - "-" "Ruby" -
I have read about this a lot and it seems that people get this issue sometimes when there are unescaped '+'s in the signature. I'm not sure if this is a Carrierwave, Fog, or AWS::S3 issue.
If you could provide any assistance with this, it would be greatly appreciated.
Thanks.
Better use the v4 signature, that should prevent this kind of error. Just add the option "signature_version: :v4" to the url_for call.
temp_url = AWS::S3::S3Object.url_for(
s3_key,
S3_CONFIG['bucket'],
use_ssl: true,
signature_version: :v4)
It is a problem with the Fog and Excon.
See this answer for how to fix it and switch to a better solution that uses the actual aws-sdk.
Library --- Disk Space --- Lines of Code --- Boot Time --- Runtime Deps --- Develop Deps
fog --- 28.0M --- 133469 --- 0.693 --- 9 --- 11
aws-sdk --- 5.4M --- 90290 --- 0.098 --- 3 --- 8*
In daemon, which tracks twitter-stream I have this construction:
client.track(*hashtags) do |status|
if status.coordinates != nil
EventMachine.synchrony
job = Qu.enqueue TweetProcessor, status
puts "Enqueued tweet processing #{job.id}"
end
end
end
For ques library I'm using qu-mongo I have this config
# /config/initializers/qu.rb
Qu.configure do |c|
c.connection = Mongo::Connection.new('127.0.0.1').db("appname_qu")
end
I've tried many options, but it always results with IOError: closed stream.
This problem is related to this question. More on this you can read here. So, in my case I've just reassign rails' logger and qu's loggers to the same file in the beginning of daemon's cycle and close it in the end, and everything is working out just fine:
client.track(*hashtags) do |status|
parser_logger = ActiveSupport::BufferedLogger.new( File.join(Rails.root, "log", "qu.log"))
Rails.logger = parser_logger
Qu.configure do |c|
c.connection = Mongo::Connection.new.db("appname_qu")
c.logger = parser_logger
end
job = Qu.enqueue TweetProcessor, status
Rails.logger.close
end