RCurl and non-Roman UTF-8 characters in URL

RCurl and non-Roman UTF-8 characters in URL - url

I'm having trouble with the specific url below. The R code below generates an "invalid URL" error.
library(RCurl)
term <- "日本"
query_url <- paste("https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&recordSchema=dcndl&recordPacking=xml&query=anywhere=",
term,"%20AND%20from%20=%201887%20AND%20until=%201887", sep="")
getURL(query_url, .encoding = "UTF8")
But when I paste the query_URL into a browser . .
https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&recordSchema=dcndl&recordPacking=xml&query=anywhere=日本%20AND%20from%20=%201887%20AND%20until=%201887
. . . I get the desired XML. What am I missing?

Apparently encoding the single Japanese term is better for the URL:
library(RCurl)
term <- "日本"
term <- url_encode(term)
query_url <- paste("https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&recordSchema=dcndl&recordPacking=xml&query=anywhere=", term,"%20AND%20from%20=%201887%20AND%20until=%201887", sep="")
# query_url <- Encoding(query_url) <- "UTF-8"
getURL(query_url, .encoding = "UTF8")

Related

How to specify headers for a web request

Trying to set headers for a http call but running into issues. Need guidance for both Authorization and a custom header x-api-key.
let url = "http://example.com"
let token = requestToken()
let request = WebRequest.Create(url) :?> HttpWebRequest
request.Method <- "GET"
request.Accept <- "application/json;charset=UTF-8"
request.Headers.Authorization <- sprintf "%s %s" token.token_type token.access_token
request.Headers["x-api-key"] <- "api-key" // custom headers
// or this
request.Headers["Authorization"] <- sprintf "%s %s" token.token_type token.access_token
The error I'm getting is
error FS3217: This expression is not a function and cannot be applied. Did you intend to access the indexervia expr.[index] instead?

The error message that you are getting actually tells you what the problem is. In F#, the syntax for using an indexer is obj.[idx] - you need to have a . between the object and the square bracket. The correct syntax in your specific case would be:
request.Headers.["x-api-key"] <- "api-key"
request.Headers.["Authorization"] <- sprintf "%s %s" token.token_type token.access_token

How to parse a sequence of " " separated values in Haskell

I am a beginner in Haskell and i need to parse a sequence of valuses spearate by something.
The following parser is generated with tokenparser:
m_semiSep1 p parses and returns a semicolon-separated sequence of one or more p's.
But i dont quite understand how it is created. I need one that returns a comma separated sequence of p`s. Can you give me a hint how can i do that. I also need to parse a sequence of "=|" separated values but i suppose that it will be the same as with the comma. This is the code i am working on:
def = emptyDef{ commentStart = "{-"
, commentEnd = "-}"
, identStart = letter
, identLetter = alphaNum
, opStart = oneOf "^~&=:-|,"
, opLetter = oneOf "^~&=:-|,"
, reservedOpNames = ["~", "&", "^", ":=", "|-", ","]
, reservedNames = ["true", "false", "nop",
"if", "then", "else", "fi",
"while", "do", "od"]
}
TokenParser{ parens = m_parens
, identifier = m_identifier
, reservedOp = m_reservedOp
, reserved = m_reserved
, semiSep1 = m_semiSep1
, whiteSpace = m_whiteSpace } = makeTokenParser def

You can use sepBy in parsec. sepBy cell deli parses something like cell deli cell deli...
For example:
Prelude> :m Text.ParserCombinators.Parsec
Prelude Text.ParserCombinators.Parsec> let csv = (many letter) `sepBy` (char ',') :: Parser [String]
Prelude Text.ParserCombinators.Parsec> parse csv "" "xx,yy,zz"
Right ["xx","yy","zz"]
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Combinator.html#v:sepBy

Scala parser failure handling, dangling commas

Getting started with Scala parser combinations, before moving on need to grasp failure/error handling better (note: still getting into Scala as well)
Want to parse strings like "a = b, c = d" into a list of tuples but flag the user when dangling commas are found.
Thought about matching off failure ("a = b, ") when matching comma separated property assignments:
def commaList[T](inner: Parser[T]): Parser[List[T]] =
rep1sep(inner, ",") | rep1sep(inner, ",") ~> opt(",") ~> failure("Dangling comma")
def propertyAssignment: Parser[(String, String)] = ident ~ "=" ~ ident ^^ {
case id ~ "=" ~ prop => (id, prop)
}
And call the parser with:
p.parseAll(p.commaList(p.propertyAssignment), "name = John , ")
which results in a Failure, no surprise but with:
string matching regex `\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*' expected but end of source found
The commList function succeeds on the first property assignment and starts repeating given the comma but the next "ident" fails on the fact that the next character is the end of the source data. Thought I could catch that 2nd alternative in the commList would match:
rep1sep(inner, ",") ~> opt(",") ~> failure("Dangling comma")
Nix. Ideas?

Scalaz to the rescue :-)
When you are working with warnings, it is not a good idea to exit your parser with a failure. You can easily combine the parser with the Scalaz writer monad. With this monads you can add messages to the partial result during the parser run. These messages could be infos, warnings or errors. After the parser finishes, you can then validate the result, if it can be used or if it contains critical problems. With such a separate vaildator step you get usual much better error messages. For example you could accept arbitrary characters at the end of the string, but issue an error when they are found (e.g. "Garbage found after last statement"). The error message can be much more helpful for the user than the cryptic default one you get in the example below ("string matching regex `\z' expected [...]").
Here is an example based on the code in your question:
scala> :paste
// Entering paste mode (ctrl-D to finish)
import util.parsing.combinator.RegexParsers
import scalaz._, Scalaz._
object DemoParser extends RegexParsers {
type Warning = String
case class Equation(left : String, right : String)
type PWriter = Writer[Vector[Warning], List[Equation]]
val emptyList : List[Equation] = Nil
def rep1sep2[T](p : => Parser[T], q : => Parser[Any]): Parser[List[T]] =
p ~ rep(q ~> p) ^^ {case x~y => x::y}
def name : Parser[String] = """\w+""".r
def equation : Parser[Equation] = name ~ "=" ~ name ^^ { case n ~ _ ~ v => Equation(n,v) }
def commaList : Parser[PWriter] = rep1sep(equation, ",") ^^ (_.set(Vector()))
def danglingComma : Parser[PWriter] = opt(",") ^^ (
_ map (_ => emptyList.set(Vector("Warning: Dangling comma")))
getOrElse(emptyList.set(Vector("getOrElse(emptyList.set(Vector(""))))
def danglingList : Parser[PWriter] = commaList ~ danglingComma ^^ {
case l1 ~ l2 => (l1.over ++ l2.over).set(l1.written ++ l2.written) }
def apply(input: String): PWriter = parseAll(danglingList, input) match {
case Success(result, _) => result
case failure : NoSuccess => emptyList.set(Vector(failure.msg))
}
}
// Exiting paste mode, now interpreting.
import util.parsing.combinator.RegexParsers
import scalaz._
import Scalaz._
defined module DemoParser
scala> DemoParser("a=1, b=2")
res2: DemoParser.PWriter = (Vector(),List(Equation(a,1), Equation(b,2)))
scala> DemoParser("a=1, b=2,")
res3: DemoParser.PWriter = (Vector(Warning: Dangling comma),List(Equation(a,1), Equation(b,2)))
scala> DemoParser("a=1, b=2, ")
res4: DemoParser.PWriter = (Vector(Warning: Dangling comma),List(Equation(a,1), Equation(b,2)))
scala> DemoParser("a=1, b=2, ;")
res5: DemoParser.PWriter = (Vector(string matching regex `\z' expected but `;' found),List())
scala>
As you can see, it handles the error cases fine. If you want to extend the example, add case classes for different kinds of errors and include the current parser positions in the messages.
Btw. the problem with the white spaces is handled by the RegexParsers class. If you want to change the handling of white spaces, just override the field whiteSpace.

Your parser isn't expecting the trailing whitespace at the end of "name = John , ".
You could use a regex to optionally parse "," followed by any amount of whitespace:
def commaList[T](inner: Parser[T]): Parser[List[T]] =
rep1sep(inner, ",") <~ opt(",\\s*".r ~> failure("Dangling comma"))
Note that you can avoid using alternatives (|) here, by making the failure part of the optional parser. If the optional part consumes some input and then fails, then the whole parser fails.

Whitespace sensitive FParsec

I'm trying to implement a whitespace sensitive parser using FParsec, and I'm starting off with the baby step of defining a function which will parse lines of text that start with n chars of whitespace.
Here's what I have so far:
let test: Parser<string list,int>
= let manyNSatisfy i p = manyMinMaxSatisfy i i p
let p = fun (stream:CharStream<int>) ->
let state = stream.UserState
// Should fail softly if `state` chars wasn't parsed
let result = attempt <| manyNSatisfy state (System.Char.IsWhiteSpace) <| stream
if result.Status <> Ok
then result
else restOfLine false <| stream
sepBy p newline
My issue is that when I run
runParserOnString test 1 "test" " hi\n there\nyou" |> printfn "%A"
I get an error on "you". I was under the impression that attempt would backtrack any state changes, and returning Error as my status would give me soft failure.
How do I get ["hi"; "there"] back from my parser?

Oh dear, how embarrassing.
I wanted sepEndBy, which is to say that I should terminate the parse on the separator.

This looks more idiomatic. I have hard-coded 1, but it's easy to extract as parameter.
let skipManyNSatisfy i = skipManyMinMaxSatisfy i i
let pMyText =
( // 1st rule
skipManyNSatisfy 1 System.Char.IsWhiteSpace // skip an arbitrary # of WhiteSpaces
>>. restOfLine false |>> Some // return the rest as Option
)
<|> // If the 1st rule failed...
( // 2nd rule
skipRestOfLine false // skip till the end of the line
>>. preturn None // no result
)
|> sepBy <| newline // Wrap both rules, separated by newLine
|>> Seq.choose id // Out of received string option seq, select only Some()

Parsing a particular string in Haskell

I'm using the parsec Haskell library.
I want to parse strings of the following kind:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc.
I'm interesting to collect only the values v1 and v2, and store these in a data structure.
I tried with the following code:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".
Also, I want to store the content of [[...]] in a data structure.
How do you solve this problem?

You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.
A var is a varName between "[[" and "]]". So, write that:
var = between (string "[[") (string "]]") varName
A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars.
varText = someText *> var `sepEndBy` someText
... where someText is anything except a '[':
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName and someText:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

RCurl and non-Roman UTF-8 characters in URL - url

Related

How to specify headers for a web request

How to parse a sequence of " " separated values in Haskell

Scala parser failure handling, dangling commas

Whitespace sensitive FParsec

Parsing a particular string in Haskell

Categories

Resources