How to output JSON in Rails without escaping back slashes - ruby-on-rails

I need to output some JSON for a customer in a somewhat unusual format. My app is written with Rails 5.
Desired JSON:
{
"key": "\/Date(0000000000000)\/"
}
The timestamp value needs to have a \/ at both the start and end of the string. As far as I can tell, this seems to be a format commonly used in .NET services. I'm stuck trying to get the slashes to output correctly.
I reduced the problem to a vanilla Rails 5 application with a single controller action. All the permutations of escapes I can think of have failed so far.
def index
render json: {
a: '\/Date(0000000000000)\/',
b: "\/Date(0000000000000)\/",
c: '\\/Date(0000000000000)\\/',
d: "\\/Date(0000000000000)\\/"
}
end
Which outputs the following:
{
"a": "\\/Date(0000000000000)\\/",
"b": "/Date(0000000000000)/",
"c": "\\/Date(0000000000000)\\/",
"d": "\\/Date(0000000000000)\\/"
}
For the sake of discussion, assume that the format cannot be changed since it is controlled by a third party.
I have uploaded a test app to Github to demonstrate the problem. https://github.com/gregawoods/test_app_ignore_me

After some brainstorming with coworkers (thanks #TheZanke), we came upon a solution that works with the native Rails JSON output.
WARNING: This code overrides some core behavior in ActiveSupport. Use at your own risk, and apply judicious unit testing!
We tracked this down to the JSON encoding in ActiveSupport. All strings eventually are encoded via the ActiveSupport::JSON.encode. We needed to find a way to short circuit that logic and simply return the unencoded string.
First we extended the EscapedString#to_json method found here.
module EscapedStringExtension
def to_json(*)
if starts_with?('noencode:')
"\"#{self}\"".gsub('noencode:', '')
else
super
end
end
end
module ActiveSupport::JSON::Encoding
class JSONGemEncoder
class EscapedString
prepend EscapedStringExtension
end
end
end
Then in the controller we add a noencode: flag to the json hash. This tells our version of to_json not to do any additional encoding.
def index
render json: {
a: '\/Date(0000000000000)\/',
b: 'noencode:\/Date(0000000000000)\/',
}
end
The rendered output shows that b gives us what we want, while a preserves the standard behavior.
$ curl http://localhost:3000/sales/index.json
{"a":"\\/Date(0000000000000)\\/","b":"\/Date(0000000000000)\/"}

Meditate on this:
Ruby treats forward-slashes the same in double-quoted and single-quoted strings.
"/" # => "/"
'/' # => "/"
In a double-quoted string "\/" means \ is escaping the following character. Because / doesn't have an escaped equivalent it results in a single forward-slash:
"\/" # => "/"
In a single-quoted string in all cases but one it means there's a back-slash followed by the literal value of the character. That single case is when you want to represent a backslash itself:
'\/' # => "\\/"
"\\/" # => "\\/"
'\\/' # => "\\/"
Learning this is one of the most confusing parts about dealing with strings in languages, and this isn't restricted to Ruby, it's something from the early days of programming.
Knowing the above:
require 'json'
puts JSON[{ "key": "\/value\/" }]
puts JSON[{ "key": '/value/' }]
puts JSON[{ "key": '\/value\/' }]
# >> {"key":"/value/"}
# >> {"key":"/value/"}
# >> {"key":"\\/value\\/"}
you should be able to make more sense of what you're seeing in your results and in the JSON output above.
I think the rules for this were originally created for C, so "Escape sequences in C" might help.

Hi I think this is the simplest way
.gsub("/",'//').gsub('\/','')
for input {:key=>"\\/Date(0000000000000)\\/"} (printed)
first gsub will do{"key":"\\//Date(0000000000000)\\//"}
second will get you
{"key":"\/Date(0000000000000)\/"}
as you needed

Related

How to parse JSON with the Oj SAX parser, Saj

I want to parse a 10-20MB JSON file, and figure it's probably a good idea to not parse the entire JSON file at once and cause major memory usage. After looking around it seems like Oj's Saj or ScHandler APIs might be a good fit.
The only problem is that I can't really wrap my head around how to use them, and the documentation doesn't make it much clearer. I've looked at the example in Saj source code, and defined a super simple subclass of Oj::Saj like below:
class MySaj < Oj::Saj
def hash_start(key)
p key
end
end
Used like this:
open(URL) do |contents|
Oj.saj_parse(handler, contents)
end
And this leads to a lot of keys from my JSON being printed out. But I still have no idea how to actually access the values belonging to the keys I'm printing.
Can I access the hash itself somehow, or how am I supposed to do this?
SAX-style parsing is complicated. You have to maintain the state of the parsing, and deal with each state change appropriately.
The hash_start and array_start callbacks, notify your SAX handler that Saj has found the beginning of a hash, and that the next callbacks that occur will be in the context of that hash. Note that hashes may be nested, contain (or be contained within) arrays, or simple values.
Here is a simple Saj handler that parses a very simple JSON object:
require 'oj'
class MySaj < ::Oj::Saj
def initialize()
#hash_cnt = 0
#array_cnt = 0
end
def hash_start(key)
#hash_cnt += 1
puts "Start-Hash[#hash_cnt]: '#{key}'"
end
def hash_end(key)
#hash_cnt -= 1
puts "End-Hash[#hash_cnt]: '#{key}'"
end
def array_start(key)
#array_cnt += 1
puts "Start-Array[#array_cnt]: '#{key}'"
end
def array_end(key)
#array_cnt -= 1
puts "End-Array[#array_cnt]: '#{key}'"
end
def add_value(value, key);
puts "Value: [#{key}] = '#{value}'"
end
def error(message, line, column)
puts "ERRRORRR: #{line}:#{column}: #{message}"
end
end
json = '[{ "key1": "abc", "key2": 123}, { "test1": "qwerty", "pi": 3.14159 }]'
cnt = MySaj.new()
Oj.saj_parse(cnt, json)
The results of this basic JSON parsing with Saj gives this result:
Start-Array[#array_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [key1] = 'abc'
Value: [key2] = '123'
End-Hash[#hash_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [test1] = 'qwerty'
Value: [pi] = '3.14159'
End-Hash[#hash_cnt]: ''
End-Array[#array_cnt]: ''
You may notice that this output is roughly equivalent to one callback per token (omitting ',' and ':'). You essentially have to build into your callbacks the knowledge of what to do with individual JSON elements. Along those lines, you also need to build the hierarchy described by the callbacks. For example, when hash_start is called, push an empty hash on the stack; when hash_end is called, pop the hash or move back one level in the hierarchy.
For example you could have a handler in hash_end that checks to see if this is ending a top-level hash, and when it is, then do something with that hash. Note that you can often not do this with arrays, as the top-level element in a very large number of JSON documents is an array, so you have to determine when the array is the top+1 level array.
If you like writing compiler backends, this is the JSON parsing solution for you. Personally, I've never enjoyed working in Sax, but for large documents, it can be very resource-friendly and highly performant, depending on how well you write the handler. Be prepared for oodles of debugging and slightly mismatched state management, as that's par for the course with Sax-style parsing.
However, you shouldn't be too concerned with 10-20MB JSON, as that's actually not very large. I've processed 80+MB JSON with "regular" Oj (load and dump) quite a lot, and not had a problem with it. Unless you're running on a severely resource-constrained machine, the standard Oj will work well for you.
Saj is a streaming parser. What that means, in practice, is that it doesn't know a file's contents in their entirety and parses them whole — it instead notifies you of parse events as it encounters them. Your thinking is solid: the larger the file, the more you benefit from parsing in that manner if you wish to pick and choose from it.
hash_start is one such event, fired when Oj sees the beginning of an Object (which will become a Hash in Ruby land).
Take this JSON for instance:
{
"student-1": {
"name": "John Doe",
"age": 42,
"knownAliases": ["Blabby Joe", "Stack Underflow"],
"trainingGrades": {
"Advanced Zumba Dancing": "A+",
"Introduction to Twitter Arguments": "C-"
}
},
"student-2": {
"name": "Rebecca Melecca",
"age": 26,
"knownAliases": ["Booger Becca", "Tanktop Terror"],
"trainingGrades": {
"Intermediate Groin Kickery": "A+",
"Advanced Quantum Mechanics": "A+"
}
}
And the following parser:
class StudentParser < Oj::Saj
def hash_start(key)
puts "hash_start(#{key.inspect})"
end
def hash_end(key)
puts "hash_end(#{key.inspect})"
end
def array_start(key)
puts "array_start(#{key.inspect})"
end
def array_end(key)
puts "array_end(#{key.inspect})"
end
def add_value(value, key)
puts "add_value(#{value.inspect}, #{key.inspect})"
end
end
And you'll get the following sequence of events:
hash_start(nil)
hash_start("student-1")
add_value("John Doe", "name")
add_value(42, "age")
array_start("knownAliases")
add_value("Blabby Joe", nil)
add_value("Stack Underflow", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Advanced Zumba Dancing")
add_value("C-", "Introduction to Twitter Arguments")
hash_end("trainingGrades")
hash_end("student-1")
hash_start("student-2")
add_value("Rebecca Melecca", "name")
add_value(26, "age")
array_start("knownAliases")
add_value("Booger Becca", nil)
add_value("Tanktop Terror", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Intermediate Groin Kickery")
add_value("A+", "Advanced Quantum Mechanics")
hash_end("trainingGrades")
hash_end("student-2")
hash_end(nil)
When you see hash_start(nil), it means the parser has found a top-level object (that very first opening brace). Conversely, hash_end(nil) means that top-level object has been closed, and its innards properly parsed (i.e. no parsing erros have been found).
Parsing in this manner means you have to keep track of nesting, if that's meaningful to you, of adding keys and values at the right value, et cetera. That makes it annoying and hard, but worthwhile if you wish to carve out bits of a large file without committing everything to memory.

Gemoji breaks Kramdown's HTML

Why does Kramdown's autolinking parser break when running it over a gemojified text field?
For [Test](http://google.com "Test") I'm getting:
Test
instead of the expected output:
Test
Live app: http://runnable.com/VAL1VuMjrGFur2yx/forem-gemoji-kramdown (see the Test post)
application_helper.rb:
def add_emojify_and_kramdown(text)
raw(Kramdown::Document.new(emojify(text)).to_html)
end
[...snip...]
def emojify(text)
h(text).to_str.gsub(/:([a-z0-9\+\-_]+):/) do |match|
if emoji = Emoji.find_by_alias($1)
'![' + $1 + '](' + asset_path("emoji/#{emoji.image_filename}") + ')'
else
match
end
end
end
Some additional info:
raw(Kramdown::Document.new(text).to_html) returns the expected output, but without Gemoji
raw(emojify(text)) doesn't change anything seeing as how text contains no emojis
raw(emojify(Kramdown::Document.new(text).to_html)) returns the expected output, but as raw HTML
The first thing your emojify method does is h(text), which HTML escapes the input, converting
[Test](http://google.com "Test")
into
[Test](http://google.com "Test")
Kramdown then operates on this string, and since it no longer contains quote marks it assumes the whole contents of (...) is the URL, producing:
Test
To get it to work you just need to drop the call to h: text.gsub(.... You’ll likely need to think about how to manage your string safety if this is external data.

Performance implications of using :coffescript filter inside HAML templates?

So HAML 4 includes a coffeescript filter, which allows us coffee-loving rails people to do neat things like this:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
My question: For the end user, is this slower than using the equivalent :javascript filter? Does using the coffeescript filter mean the coffeescript will be compiled to javascript on every page load (which would obviously be a performance disaster), or does this only happen once when the application is started?
It depends.
When Haml compiles a filter it checks to see if the filter text contains any interpolation (#{...}). If there isn’t any then it will be the same text to transform on each request, so the conversion is done once at compile time and the result included in the template.
If there is interpolation in the filter text, then the actual text to transform will vary on each request, so the Coffeescript will need to be compiled each time.
Here’s an example. First with no interpolation:
:coffeescript
$ ->
alert "No semicolons! Awesome"
This generates the code (use haml -d to see the generated Ruby code):
_hamlout.buffer << "<script>\n (function() {\n $(function() {\n return alert(\"No semicolons! Awesome\");\n });\n \n }).call(this);\n</script>\n";
This code simply adds a string to the buffer, so no Coffeescript is being recompiled.
Now with interpolation:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
This generates:
word = "Awesome."
_hamlout.buffer << "#{
find_and_preserve(Haml::Filters::Coffee.render_with_options(
"$ ->
alert \"No semicolons! #{word}\"\n", _hamlout.options))
}\n";
Here, since Haml needs to wait to see what the value of the interpolation is, the Coffeescript is recompiled each time.
You can avoid compiling the Coffeescript on each request by not having any interpolation inside your :coffeescript filters.
The :javascript filter behaves similarly, checking to see if there is any interpolation, but since the :javascript filter only outputs some text to the buffer when it runs there is much less of a performance hit using it. You could possibly combine :javascript and :coffeescript filters, putting interpolated data in :javascript and keeping your :coffeescript static:
- word = "Awesome"
:javascript
var message = "No semicolons! #{word}";
:coffeescript
alert message
matt's answer is clear on what is going on. I made a helper to add locals to :coffeescript filters from a hash. This way you don't need to use global JavaScript variables. As a side note: on Linux, the slowdown is really negligible. On Windows however, the impact on performance is quite important (easily more than 100ms per block to compile).
module HamlHelper
def coffee_with_locals locals={}, &block
block_content = capture_haml do
block.call
end
return block_content if locals.blank?
javascript_locals = "\nvar "
javascript_locals << locals.map{ |key, value| j(key.to_s) + ' = ' + value.to_json.gsub('</', '<\/') }.join(",\n ")
javascript_locals << ";\n"
content_node = Nokogiri::HTML::DocumentFragment.parse(block_content)
content_node.search('script').each do |script_tag|
# This will match the '(function() {' at the start of coffeescript's compiled code
split_coffee = script_tag.content.partition(/\(\s*function\s*\(\s*\)\s*\{/)
script_tag.content = split_coffee[0] + split_coffee[1] + javascript_locals + split_coffee[2]
end
content_node.to_s.html_safe
end
end
It allows you to do the following:
= coffee_with_locals "test" => "hello ", :something => ["monde", "mundo", "world"], :signs => {:interogation => "?", :exclamation => "!"} do
:coffeescript
alert(test + something[2] + signs['exclamation'])
Since there is no interpollation, the code is actually compiled as normal.

Regex in Ruby: expression not found

I'm having trouble with a regex in Ruby (on Rails). I'm relatively new to this.
The test string is:
http://www.xyz.com/017010830343?$ProdLarge$
I am trying to remove "$ProdLarge$". In other words, the $ signs and anything between.
My regular expression is:
\$\w+\$
Rubular says my expression is ok. http://rubular.com/r/NDDQxKVraK
But when I run my code, the app says it isn't finding a match. Code below:
some_array.each do |x|
logger.debug "scan #{x.scan('\$\w+\$')}"
logger.debug "String? #{x.instance_of?(String)}"
x.gsub!('\$\w+\$','scl=1')
...
My logger debug line shows a result of "[]". String is confirmed as being true. And the gsub line has no effect.
What do I need to correct?
Use /regex/ instead of 'regex':
> "http://www.xyz.com/017010830343?$ProdLarge$".gsub(/\$\w+\$/, 'scl=1')
=> "http://www.xyz.com/017010830343?scl=1"
Don't use a regex for this task, use a tool designed for it, URI. To remove the query:
require 'uri'
url = URI.parse('http://www.xyz.com/017010830343?$ProdLarge$')
url.query = nil
puts url.to_s
=> http://www.xyz.com/017010830343
To change to a different query use this instead of url.query = nil:
url.query = 'scl=1'
puts url.to_s
=> http://www.xyz.com/017010830343?scl=1
URI will automatically encode values if necessary, saving you the trouble. If you need even more URL management power, look at Addressable::URI.

Evaluating a user-input expression in Rails

I need to accept an mathematical expression (including one or more unknowns) from the user and substitute values in for the unknowns to get a result.
I could use eval() to do this, but it's far too risky unless there is a way to recognise "safe" expressions.
I'd rather not write my own parser if I can help it.
I searched for a ready-made parser but the only one I found ( https://www.ruby-toolbox.com/gems/expression_parser , which seems to be the same as the one discussed at http://lukaszwrobel.pl/blog/math-parser-part-4-tests) seems to be limited to the "four rules" +-*/. I need to include exponential, log and trig functions at the very least.
Any suggestions?
UPDATE: EXAMPLE
include Math
def exp(x)
Math.exp(x)
end
def cos(x)
Math.cos(x)
end
pi=Math::PI
t=2
string= '(3*exp(t/2)*cos(3*t-pi/2))'
puts eval(string)
UPDATE - a pre-parsing validation step
I think I will use this regex to check the string has the right kinds of tokens in it:
/((((cos\(|sin\()|(tan\(|acos\())|((asin\(|atan\()|(exp\(|log\())|ln\()(([+-\/*^\(\)A-Z]|\d)+))*([+-\/*^\(\)A-Z]|\d)+/
But I will still implement the parsing method during the actual evaluation.
Thanks for the help!
You can checkout the Dentaku gem - https://github.com/rubysolo/dentaku
You can use it to execute the user given formula.
Here is an example usage of this gem.
class FormulaExecutor
def execute_my_formula(formula, params)
calc = Dentaku::Calculator.new
# Param 1 => formula to execute
# Param 2 => Hash of local variables
calc.evaluate(formula, params)
end
end
FormulaExecutor.new.execute_my_formula( "length * breadth", {'length' => 11, 'breadth' => 120} )
If eval would work, then you could parse the expression using a ruby parser (eg gem install ruby_parser), and then evaluate the S expression recursively, either ignoring or raising an error on non-arithmetic functions. This probably needs some work, but sounded like fun:
require 'ruby_parser'
def evaluate_arithmetic_expression(expr)
parse_tree = RubyParser.new.parse(expr) # Sexp < Array
return evaluate_parse_tree(parse_tree)
end
def evaluate_parse_tree(parse_tree)
case parse_tree[0]
when :lit
return parse_tree[1]
when :call
meth = parse_tree[2]
if [:+, :*, :-, :/, :&, :|, :**].include? meth
val = evaluate_parse_tree parse_tree[1]
arglist = evaluate_parse_tree parse_tree[3]
return val.send(meth, *arglist)
else
throw 'Unsafe'
end
when :arglist
args = parse_tree[1..-1].map {|sexp| evaluate_parse_tree(sexp) }
return args
end
end
You should be able to enhance this to include things like cos, sin, etc. pretty easily. It works for some simple examples I tried, and and includes a free check for well-formedness (parsing raises a Racc::ParseError exception if not).
Start with the assumption that eval doesn't exist unless you have a very tight grip on the evaluated content. Even if you don't parse, you could split all input into tokens and check that each is an acceptable token.
Here is a very crude way to check that input has nothing other than valid tokens. Lots of refactoring/ improvments possible.
include Math
def exp(x)
Math.exp(x)
end
def cos(x)
Math.cos(x)
end
pi=Math::PI
t=2
a = %Q(3*exp(t/2)*cos(3*t-pi/2)) # input string
b = a.tr("/*)([0-9]-",'') # remove all special single chars
b.gsub!(/(exp|cos|pi|t)/,'') # remove all good tokens
eval(a) if b == '' # eval if nothing other than good tokens.

Resources