Scala equivalent of java.util.Scanner - parsing

I am very familiar with using java.util.Scanner with next(), hasNext(), nextInt(), nextLine(), and the like to parse input.
Is there something else I should use in Scala?
This data isn't structured according to a grammar; it's more ad-hoc than that.
For example, lets say I had a inventory. Each line of input starts with the name, then has the quantity of those items, then has the ids for those items
Firetruck 2 A450M A451M
Machine 1 QZLT
Keyboard 0
I see that Console has methods such as readInt(), but that reads an entire line of input; the equivalent of nextInt() doesn't seem to exist.
java.util.Scanner obviously does the trick. But is there something else I should use (for example, something that returns Scala rather than Java types)?

No there is no equivalent Scala implementation. But I don't see a reason for one as java.util.Scanner would work perfectly fine and all java primitives types would be converted to Scala types implicitly.
So as for "for example, something that returns Scala rather than Java types", Scanner will return scala types when used in scala.

better-files is a Scala library that provides faster, safer and more idiomatic replacement for java.util.Scanner and provides more operations like peeking and scanning.
Link: https://github.com/pathikrit/better-files#scanner
Code sample:
val data = (home / "Desktop" / "stocks.tsv") << s"""
| id Stock Price Buy
| ---------------------
| 1 AAPL 109.16 false
| 2 GOOGL 566.78 false
| 3 MSFT 39.10 true
""".stripMargin
val scanner: Scanner = data.newScanner.skip(lines = 2)
assert(scanner.peekLine == Some(" 1 AAPL 109.16 false"))
assert(scanner.peek == Some("1"))
assert(scanner.nextInt == Some(1))
assert(scanner.peek == Some("AAPL"))
assert(scanner.nextString() == Some("AAPL"))
assert(scanner.nextInt() == None)
assert(scanner.nextDouble() == Some(109.16))
assert(scanner.nextBoolean() == Some(false))
while(scanner.hasNext) {
println(scanner.nextInt(), scanner.next(), scanner.nextDouble(), scanner.nextBoolean())
}

With the input data denoted by spaces and newlines this can be nicely done with map and split on each line from input.
def input =
"""Firetruck 2 A450M A451M
Machine 1 QZLT
Keyboard 0"""
case class Item(name: String, quantity: Int, ids: Array[String])
scala> input.lines.map(_.split(" ")).map(split => Item(split(0), split(1).toInt, split.takeRight(2))).toList
res0: List[Item] = List(Item(Firetruck,2,[Ljava.lang.String;#6608842e), Item(Machine,1,[Ljava.lang.String;#391e1c57), Item(Keyboard,0,[Ljava.lang.String;#67d6b10c))
scala>res0.foreach(println(_))
Item(Firetruck,2,[Ljava.lang.String;#6608842e)
Item(Machine,1 [Ljava.lang.String;#391e1c57)
Item(Keyboard,0,[Ljava.lang.String;#67d6b10c)

Related

Can I Use Multiple Properties in a StructuredFormatDisplayAttribute?

I'm playing around with StructuredFormatDisplay and I assumed I could use multiple properties for the Value, but it seems that is not the case. This question (and accepted answer) talk about customizing in general, but the examples given only use a single property. MSDN is not helpful when it comes to usage of this attribute.
Here's my example:
[<StructuredFormatDisplay("My name is {First} {Last}")>]
type Person = {First:string; Last:string}
If I then try this:
let johnDoe = {First="John"; Last="Doe"}
I end up with this error:
<StructuredFormatDisplay exception: Method 'FSI_0038+Person.First}
{Last' not found.>
The error seems to hint at it only capturing the first property mentioned in my Value but it's hard for me to say that with any confidence.
I have figured out I can work around this by declaring my type like this:
[<StructuredFormatDisplay("My name is {Combined}")>]
type Person = {First:string; Last:string} with
member this.Combined = this.First + " " + this.Last
But I was wondering if anyone could explain why I can't use more than one property, or if you can, what syntax I'm missing.
I did some digging in the source and found this comment:
In this version of F# the only valid values are of the form PreText
{PropertyName} PostText
But I can't find where that limitation is actually implemented, so perhaps someone more familiar with the code base could simply point me to where this limitation is implemented and I'd admit defeat.
The relevant code from the F# repository is in the file sformat.fs, around line 868. Omitting lots of details and some error handling, it looks something like this:
let p1 = txt.IndexOf ("{", StringComparison.Ordinal)
let p2 = txt.LastIndexOf ("}", StringComparison.Ordinal)
if p1 < 0 || p2 < 0 || p1+1 >= p2 then
None
else
let preText = if p1 <= 0 then "" else txt.[0..p1-1]
let postText = if p2+1 >= txt.Length then "" else txt.[p2+1..]
let prop = txt.[p1+1..p2-1]
match catchExn (fun () -> getProperty x prop) with
| Choice2Of2 e ->
Some (wordL ("<StructuredFormatDisplay exception: " + e.Message + ">"))
| Choice1Of2 alternativeObj ->
let alternativeObjL =
match alternativeObj with
| :? string as s -> sepL s
| _ -> sameObjL (depthLim-1) Precedence.BracketIfTuple alternativeObj
countNodes 0 // 0 means we do not count the preText and postText
Some (leftL preText ^^ alternativeObjL ^^ rightL postText)
So, you can easily see that this looks for the first { and the last }, and then picks the text between them. So for foo {A} {B} bar, it extracts the text A} {B.
This does sound like a silly limitation and also one that would not be that hard to improve. So, feel free to open an issue on the F# GitHub page and consider sending a pull request!
Just to put a bow on this, I did submit a PR to add this capability and yesterday it was accepted and pulled into the 4.0 branch.
So starting with F# 4.0, you'll be able to use multiple properties in a StructuredFormatDisplay attribute, with the only downside that all curly braces you wish to use in the message will now need to be escaped by a leading \ (e.g. "I love \{ braces").
I rewrote the offending method to support recursion and switched to using a regular expression to detect property references. It seems to work pretty well, although it isn't the prettiest code I've ever written.

string comparison against factors in Stata

Suppose I have a factor variable with labels "a" "b" and "c" and want to see which observations have a label of "b". Stata refuses to parse
gen isb = myfactor == "b"
Sure, there is literally a "type mismatch", since my factor is encoded as an integer and so cannot be compared to the string "b". However, it wouldn't kill Stata to (i) perform the obvious parse or (ii) provide a translator function so I can write the comparison as label(myfactor) == "b". Using decode to (re)create a string variable defeats the purpose of encoding, which is to save space and make computations more efficient, right?
I hadn't really expected the comparison above to work, but I at least figured there would be a one- or two-line approach. Here is what I have found so far. There is a nice macro ("extended") function that maps the other way (from an integer to a label, seen below as local labi: label ...). Here's the solution using it:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// first, how many groups are there?
by myfactor, sort: gen ng = _n == 1
replace ng = sum(ng)
scalar ng = ng[_N]
drop ng
// now, which code corresponds to "b"?
forvalues i = 1/`=ng'{
local labi: label myfactor `i'
if "b" == "`labi'" {
scalar bcode = `i'
break
}
}
di bcode
The second step is what irks me, but I'm sure there's a also faster, more idiomatic way of performing the first step. Can I grab the length of the label vector, for example?
An example:
clear all
set more off
sysuse auto
gen isdom = 1 if foreign == "Domestic":`:value label foreign'
list foreign isdom in 1/60
This creates a variable called isdom and it will equal 1 if foreigns's value label is equal to "Domestic". It uses an extended macro function.
From [U] 18.3.8 Macro expressions:
Also, typing
command that makes reference to `:extended macro function'
is equivalent to
local macroname : extended macro function
command that makes reference to `macroname'
This explains one of the two : in the offered syntax. The other can be explained by
... to specify value labels directly in an expression, rather than through
the underlying numeric value ... You specify the label in double quotes
(""), followed by a colon (:), followed by the name of the value
label.
The quote is from Stata tip 14: Using value labels in expressions, by Kenneth Higbee, The Stata Journal (2004). Freely available at http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Edit
On computing the number of distinct observations, another way is:
by myfactor, sort: gen ng = _n == 1
count if ng
scalar sc_ng = r(N)
display sc_ng
But yours is fine. In fact, it is documented here: http://www.stata.com/support/faqs/data-management/number-of-distinct-observations/, along with more methods and comments.

Comparing discriminated union cases with < and > in F#

I'm learning F# and I am building a quick set of functions which compare two poker hands and determine the winner.
I made this discriminated union to represent categories of poker hands:
type Category =
| HighCard
| OnePair
| TwoPair
| ThreeOfAKind
| Straight
| Flush
| FullHouse
| FourOfAKind
| StraightFlush
I use this code to compare categories to determine if one hand is better than another:
if playerCategory > houseCategory then Win
elif playerCategory < houseCategory then Loss
// ... More code to handle cases within the same category
So, for example, the expression:
let playerCategory = FullHouse
let houseCategory = HighCard
if playerCategory > houseCategory then Win
elif playerCategory < houseCategory then Loss
// ... Other code
Would have the value Win.
However, I don't understand how the < and > operators are able to work here. (Originally I had a function which mapped each case to a numeric value, but I realized it wasn't necessary.) If I rearrange the order of the cases then the logic breaks, so I'm assuming each case is assigned some default value corresponding to its order within the type?
But I would definitely appreciate a bit more insight...
This is described in the specification:
by default, record, union, and struct type definitions called
structural types implicitly include compiler-generated declarations
for structural equality, hashing, and comparison. These implicit
declarations consist of the following for structural equality and
hashing
8.15.4 Behavior of the Generated CompareTo implementations
If T is a union type, invoke Microsoft.FSharp.Core.Operators.compare
first on the index of the union cases for the two values, and then on
each corresponding field pair of x and y for the data carried by the
union case. Return the first non-zero result.
In addition to what Lee said, there's also in the spec
8.5.4 Compiled Form of Union Types for Use from Other CLI Languages
A compiled union type U has:
...
One CLI instance property U.Tag for each case C. This property fetches or computes an integer tag corresponding to the case.
The compiler-generated CompareTo method uses the backing fields of these properties to determine the index as stipulated in 8.15.4. This is evidenced by IlSpy:
int tag = this._tag;
int tag2 = category._tag;
if (tag != tag2)
{
return tag - tag2;
}
if (this.Tag != 0)
{
return 0;
}

Comparing Discriminated Unions

I'm a newbie to F# and I'm playing around with FParsec. I would use FParsec to generate an AST. I would like to use FsUnit to write some tests around the various parts of the parser to ensure correct operation.
I'm having a bit of trouble with the syntax (sorry, the exact code is at work, I can post a specific example later) so how exactly could one compare two discriminated unions (one the expected, the other the actual result)? Could someone provide a tiny code example using FsUnit (or NUnit), please?
An example discriminated union (very simple)
type AST =
| Variable of string
| Class of string
| Number of int
Since, as Brian pointed out, F# unions have structural equality, this is easy using whichever unit testing framework you are fond of.
FsUnit is an F# specific library built on top of NUnit. My personal favorite F# specific unit testing library is Unquote, ;), which is framework agnostic, working very well with NUnit, xUnit.net, MbUnit, ... or even within FSI. You may be interested in this comparison with FsUnit.
So, how would you do this with NUnit + Unquote? Here's a full working example:
module UnitTests
open NUnit.Framework
open Swensen.Unquote
type AST =
| Variable of string
| Class of string
| Number of int
let mockFParsec_parseVariable input = Variable(input)
[<Test>]
let ``test variable parse, passing example`` () =
test <# mockFParsec_parseVariable "x" = Variable("x") #>
[<Test>]
let ``test variable parse, failing example`` () =
test <# mockFParsec_parseVariable "y" = Variable("x") #>
Then running the tests using TestDriven.NET, the output is as follows:
------ Test started: Assembly: xxx.exe ------
Test 'UnitTests.test variable parse, failing example' failed:
UnitTests.mockFParsec_parseVariable "y" = Variable("x")
Variable "y" = Variable("x")
false
C:\xxx\UnitTests.fs(19,0): at UnitTests.test variable parse, failing example()
1 passed, 1 failed, 0 skipped, took 0.80 seconds (NUnit 2.5.10).
An example - if you want to check the type but not the contents
let matched x=
match x with
|Variable(_) -> true
| _ -> false
Note here that you need a different function for each element of the discriminated union
If you want to compare equality, you can just do it in the standard way, like
Assert.AreEqual(Variable("hello"),result)
or
if result = Variable("hello") then stuff()

How to extract data from F# list

Following up my previous question, I'm slowly getting the hang of FParsec (though I do find it particularly hard to grok).
My next newbie F# question is, how do I extract data from the list the parser creates?
For example, I loaded the sample code from the previous question into a module called Parser.fs, and added a very simple unit test in a separate module (with the appropriate references). I'm using XUnit:
open Xunit
[<Fact>]
let Parse_1_ShouldReturnListContaining1 () =
let interim = Parser.parse("1")
Assert.False(List.isEmpty(interim))
let head = interim.Head // I realise that I have only one item in the list this time
Assert.Equal("1", ???)
Interactively, when I execute parse "1" the response is:
val it : Element list = [Number "1"]
and by tweaking the list of valid operators, I can run parse "1+1" to get:
val it : Element list = [Number "1"; Operator "+"; Number "1"]
What do I need to put in place of my ??? in the snippet above? And how do I check that it is a Number, rather than an Operator, etc.?
F# types (including lists) implement structural equality. This means that if you compare two lists that contain some F# types using =, it will return true when the types have the same length and contain elements with the same properties.
Assuming that the Element type is a discriminated union defined in F# (and is not an object type), you should be able to write just:
Assert.Equal(interim, [Number "1"; Operator "+"; Number "1"])
If you wanted to implement the equality yourself, then you could use pattern matching;
let expected = [Number "1"]
match interim, expected with
| Number a, Number b when a = b -> true
| _ -> false

Resources