haskell parsing data structure with extra information - parsing

I have problem to extract extra information from my parsing.
I have my own data structure to parse, and that works fine. I wrote the parser for my data structure as Parse MyDataStructure which parse all the information about MyDataStructure.
The problem is that in the string I'm parsing, mixed with MyDataStructure, there is also some information about what should I do with MyDataStructure which is of course not part of MyDataStructure, i.e. I cannot store this information inside MyDataStructure.
Now the problem is that I don't know how to store this information, since in Haskell I cannot change some global variable to store information, and the return value of my parser is already MyDataStructure.
Is there a way I can somehow store this new information, without changing MyDataStructure, i.e. including field to store the extra information (but the extra information are not part of MyDataStructure so I would really like avoiding doing that)?
I hope I have been clear enough.

As #9000 says, you could use a tuple. If you find yourself needing to pass it through a number of functions, using the State Monad might make things easier.

Related

Decoding only part of a JSON using Codable protocol

I have a JSON which contains an array of dictionaries and I decode it, using Swift's JSONDecoder class.
I wonder, is it possible to make the class to decode only some dictionaries, not all, for example (maybe based on some criteria)? I guess, this might be useful if the array contains many dictionaries, but you don't want all of them but only a single one.
If you know how to do this, I would appreciate your help.
Technically one can write an init(from:) method that manually gets the container for the decoder and then get the "nested" container (e.g., nestedUnkeyedContainer), and manually decode the items within that collection, only adding the ones you want. See Encoding and Decoding Custom Types for an introduction to writing init(from:) methods.
But I would discourage you from doing that. It's going to be much simpler and logical to parse the whole JSON and then filter the resulting collection to distill it down to the ones you want.
Unless you have a lot of records (e.g. millions?) where the parsing overhead becomes observable, I would suggest performing a decode the entire JSON and then filter your array. This will require far less code and is the more logical approach.
And if you had that many records, before I contemplated the init(from:) kludge, I would reconsider using JSON at all. I'd use CoreData or SQLite or something like that which is better suited for dynamic filtering of data as it is being extracted.

Persistent Storage of an Array containing Tuples in Swift IOS

I have this Array in a swift IOS App. I want it to be both editable and permanently stored on the App. NSFileManager doesn't like the fact the Array contains tuples. Is there any way round that or does anyone have any suggestions as to how else I could store it?
var topicBook = [("Name",["Species"],"Subject","Rotation","Unit","extra",[(0,0)],"content"),("Name",["Species"],"Subject","Rotation","Unit","extra",[(0,0)],"contentTwo"),("Name",["Species"],"Subject","Rotation","Unit","extra",[(0,0)],"contentThree")]**strong text**
From Apple in The Swift Programming Langauge:
Tuples are useful for temporary groups of related values. They are not suited to the creation of complex data structures. If your data structure is likely to persist beyond a temporary scope, model it as a class or structure, rather than as a tuple. For more information, see Classes and Structures.
And your tuple is pretty complex, so whether or not you need to persist the data I'd recommend using a struct or class anyway. Otherwise it will inevitably become hard to read and work with and will hurt the productivity of you and anyone you're working with that has to use it.

Is it possible to write a F# type provider to linked data?

I really like the Freebase and World Bank type providers and I would like to learn more about type providers by writing one on my own. The European Union has an open data program where you can access data through SPARQL/Linked data. Would it be possible to wrap data access to open EU data by means of a type provider or will it be a waste of time trying to figure out how to do it?
Access to EU data is described here: http://open-data.europa.eu/en/linked-data
I think it is certainly possible - I talked with some people who are actually interested in this (and are working on this, but I'm not sure what is the current status). Anyway - I definitely think this is such a broad area that an additional effort would not be a waste of time.
The key problem with writing a type provider for RDF-like data is to decide what to treat as types (what should become a name of a type or a property name) and what should be left as value (returned as a list or key-value pairs). This is quite obvious for WorldBank - names of countries & properties become types (property names) and values become data. But for triple based data set, this is less obvious.
So far, I think there are two approaches:
Additional ontology - require that the data source comes with some additional ontology that specifies what are the keys for navigation. There is something called "facet ontology" which is used on http://mspace.fm and that might be quite interesting.
Parameterization - parameterize the type provider (in some way) and give it a list of relations that should become available at the type level (and you would probably also need to provide some root where to start).
There are definitely other possibilities - and I think having provider for linked data would be really interesting. If you wanted to do this for F# Data, there is a useful page on contributing :-).

Delphi TStringList wrapper to implement on-the-fly compression

I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile

Using Haskell's Parsec to parse binary files?

Parsec is designed to parse textual information, but it occurs to me that Parsec could also be suitable to do binary file format parsing for complex formats that involve conditional segments, out-of-order segments, etc.
Is there an ability to do this or a similar, alternative package that does this? If not, what is the best way in Haskell to parse binary file formats?
The key tools for parsing binary files are:
Data.Binary
cereal
attoparsec
Binary is the most general solution, Cereal can be great for limited data sizes, and attoparsec is perfectly fine for e.g. packet parsing. All of these are aimed at very high performance, unlike Parsec. There are many examples on hackage as well.
You might be interested in AttoParsec, which was designed for this purpose, I think.
I've used Data Binary successfully.
It works fine, though you might want to use Parsec 3, Attoparsec, or Iteratees. Parsec's reliance on String as its intermediate representation may bloat your memory footprint quite a bit, whereas the others can be configured to use ByteStrings.
Iteratees are particularly attractive because it is easier to ensure they won't hold onto the beginning of your input and can be fed chunks of data incrementally a they come available. This prevents you from having to read the entire input into memory in advance and lets you avoid other nasty workarounds like lazy IO.
The best approach depends on the format of the binary file.
Many binary formats are designed to make parsing easy (unlike text formats that are primarily to be read by humans). So any union data type will be preceded by a discriminator that tells you what type to expect, all fields are either fixed length or preceded by a length field, and so on. For this kind of data I would recommend Data.Binary; typically you create a matching Haskell data type for each type in the file, and then make each of those types an instance of Binary. Define the "get" method for reading; it returns a "Get" monad action which is basically a very simple parser. You will also need to define a "put" method.
On the other hand if your binary data doesn't fit into this kind of world then you will need attoparsec. I've never used that, so I can't comment further, but this blog post is very positive.

Resources