F# Dsl for simple expression? - f#

I want to write a generic program to load data from text files or database to a table.
The transformation between the source and destination shouldn't be hard-coded. They may have the format of
ColA = Col1 + Col1 * 1.5
ColB = convert Col3 to date
These rules may need to be converted to SQL or C# code. Does F# already have some library to do these? Is F# the good language to implement it?

With so few specific details in your question, we can't really give you a good answer. But here are a few F# libraries that you might find useful for what you're trying to do:
FSharp.Data - Whether your incoming data is in SQL, CSV, JSON, or XML, there's a type provider that can parse it for you and let you write type-safe queries against it.
FParsec - Lets you easily write custom parsers, so that you can define your transformations in a custom DSL without too much effort. You mentioned custom DSLs in your title, so that's why I'm recommending FParsec. I've used it myself for exactly that purpose, and it was great.
That's about all the help I can give you until I know more details about what you're trying to achieve.

Related

Parsing and pretty printing the same file format in Haskell

I was wondering, if there is a standard, canonical way in Haskell to write not only a parser for a specific file format, but also a writer.
In my case, I need to parse a data file for analysis. However, I also simulate data to be analyzed and save it in the same file format. I could now write a parser using Parsec or something equivalent and also write functions that perform the text output in the way that it is needed, but whenever I change my file format, I would have to change two functions in my code. Is there a better way to achieve this goal?
Thank you,
Dominik
The BNFC-meta package https://hackage.haskell.org/package/BNFC-meta-0.4.0.3
might be what you looking for
"Specifically, given a quasi-quoted LBNF grammar (as used by the BNF Converter) it generates (using Template Haskell) a LALR parser and pretty pretty printer for the language."
update: found this package that also seems to fulfill the objective (not tested yet) http://hackage.haskell.org/package/syntax

tool to extract data structures from unclean data

I have unstructured geneally unclean data in a database field. There are common structures which are consistent in the data
namely:
field:
name:value
fieldset:
nombre <FieldSet>
field,
.
.
.
field(n)
table
nombre <table>
head(1)... head(n)
val(1)... val(n)
.
.
.
I was wondering if there was a tool (preferably in Java) that could extract learn/understand these data structures, parse the file and convert to a Map or object which I could run validation checks on?
I am aware of Antlr but understand this is more geared towards tree construction, an not independent bits of data (am I wrong about this?)
Does anyone have any suggestions for the problem as a whole?
I recommend Talend. It is very versatile, open source data integration tool. It is based on java. You can use build in tools/components to extract data from unstructured data sources. You can also write complex custom java code to do what you want.
I used Talend in couple of scientific proof of concept projects of mine. It worked for me. Good part is, it is free!
We ended up using antlr for this, it required us to make multiple lexers where one lexer would manipulated the input for the next lexer.
Another project is pads - wrote in C
You should use "bnflite"
https://github.com/r35382/bnflite
Using this template library you need to develop BNF like gramma for your text by means of classes and overloaded operators directly in C++ code.
The benefit is that such gramma is easily adjustable to your source

Need help to gather data from a txt file and insert in a web page?

could someone advise me on the most efficient way to gather data from one source, select a specific piece of data and insert it in a web page? Specifically, I wish to:
Call up this buoy data text file: http://www.ndbc.noaa.gov/data/realtime2/46237.txt
Find the water temperature and insert that value in my web page.
First big question: What scripting language should I use? (I'm assuming Fortran is not an option :-)
Second not so big question: This same data set is available in graphic and xml format. Would either of these data formats be more useful than the .txt file?
Thanks in advance.
Use Perl.
(Hey, you asked. Normally one programs in whatever language one would normally use.)
The XML format won't be much more useful than the text format.
This text file format is just about as simple as it could ever get. Just about any scripting or general purpose programming language will work. The critical part is to split each line on a regex "\s+". i.e. in Python it would be:
import re
theFileObject = open('/path/to/downloaded/file.txt')
for line in theFileObject.readlines():
columns = re.split(r'\s+', line)
# each column is columns[0] through columns[19]
So basically choose whatever programming language seems easiest to you. Any .NET language would be equally capable, as well as ruby, python, scheme, etc. I personally have a distaste for perl because I find it very difficult to read

Approaching Text Parsing in Scala

I'm making an application that will parse commands in Scala. An example of a command would be:
todo get milk for friday
So the plan is to have a pretty smart parser break the line apart and recognize the command part and the fact that there is a reference to time in the string.
In general I need to make a tokenizer in Scala. So I'm wondering what my options are for this. I'm familiar with regular expressions but I plan on making an SQL like search feature also:
search todo for today with tags shopping
And I feel that regular expressions will be inflexible implementing commands with a lot of variation. This leads me to think of implementing some sort of grammar.
What are my options in this regard in Scala?
You want to search for "parser combinators". I have a blog post using this approach (http://cleverlytitled.blogspot.com/2009/04/shunting-yard-algorithm.html), but I think the best reference is this series of posts by Stefan Zieger (http://szeiger.de/blog/2008/07/27/formal-language-processing-in-scala-part-1/)
Here are slides from a presentation I did in Sept. 2009 on Scala parser combinators. (http://sites.google.com/site/compulsiontocode/files/lambdalounge/ImplementingExternalDSLsUsingScalaParserCombinators.ppt) An implementation of a simple Logo-like language is demonstrated. It might provide some insights.
Scala has a parser library (scala.util.parsing.combinator) which enables one to write a parser directly from its EBNF specification. If you have an EBNF for your language, it should be easy to write the Scala parser. If not, you'd better first try to define your language formally.

Will ANTLR Help? Different Suggestion?

Before I dive into ANTLR (because it is apparently not for the faint of heart), I just want to make sure I have made the right decision regarding its usage.
I want to create a grammar that will parse in a text file with predefined tags so that I can populate values within my application. (The text file is generated by another application.) So, essentially, I want to be able to parse something like this:
Name: TheFileName
Values: 5 3 1 6 1 3
Other Values: 5 3 1 5 1
In my application, TheFileName is stored as a String, and both sets of values are stored to an array. (This is just a sample, the file is much more complicated.) Anyway, am I at least going down the right path with ANTLR? Any other suggestions?
Edit
The files are created by the user and they define the areas via tags. So, it might look something like this.
Name: <string>TheFileName</string>
Values: <array>5 3 1 6 1 3</array>
Important Value: <double>3.45</double>
Something along those lines.
The basic question is how is the file more complicated? Is it basically more of the same, with a tag, a colon and one or more values, or is the basic structure of the other lines more complex? If it's basically just more of the same, code to recognize and read the data is pretty trivial, and a parser generator isn't likely to gain much. If the other lines have substantially different structure, it'll depend primarily on how they differ.
Edit: Based on what you've added, I'd go one (tiny) step further, and format your file as XML. You can then use existing XML parsers (and such) to read the files, extract data, verify that they fit a specified format, etc.
It depends on what control you have over the format of the file you are parsing. If you have no control then a parser-generator such as ANTLR may be valuable. (We do this ourselves for FORTRAN output files over which we have no control). It's quite a bit of work but we have now mastered the basic ANTLR lexer/parser strategy and it's starting to work well.
If, however, you have some or complete control over the format then create it with as much markup as necessary. I would always create such a file in XML as there are so many tools for processing it (not only the parsing, but also XPath, databases, etc.) In general we use ANTLR to parse semi-structured information into XML.
If you don't need for the format to be custom-built, then you should look into using an existing format such as JSON or XML, for which there are parsers available.
Even if you do need a custom format, you may be better off designing one that is dirt simple so that you don't need a full-blown grammar to parse it. Designing your own scripting grammar from scratch and doing a good job of it is a lot of work.
Writing grammar parsers can also be really fun, so if you're curious then you should go for it. But I don't recommend carelessly mixing learning exercises with practical work code.
Well, if it's "much more complicated", then, yes, a parser generator would be helpful. But, since you don't show the actual format of your file, how could anybody know what might be the right tool for the job?
I use the free GOLD Parser Builder, which is incredibly easy to use, and can generate the parser itself in many different languages. There are samples for parsing such expressions also.
If the format of the file is up to the user can you even define a grammar for it?
Seems like you just want a lexer at best. Using ANTLR just for the lexer part is possible, but would seem like overkill.

Resources