How to read .docx file using F# - f#

How can I read a .docx file using F#. If I use
System.IO.File.ReadAllText("D:/test.docx")
It is returning me some garbage output with beep sounds.

Here is a F# snippet that may give you a jump-start. It successfully extracts all text contents of a Word2010-created .docx file as a string of concatenated lines:
open System
open System.IO
open System.IO.Packaging
open System.Xml
let getDocxContent (path: string) =
use package = Package.Open(path, FileMode.Open)
let stream = package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream()
stream.Seek(0L, SeekOrigin.Begin) |> ignore
let xmlDoc = new XmlDocument()
xmlDoc.Load(stream)
xmlDoc.DocumentElement.InnerText
printfn "%s" (getDocxContent #"..\..\test.docx")
In order to make it working do not forget to reference WindowsBase.dll in your VS project.

.docx files follow Open Packaging Convention specifications. At the lowest level, they are .ZIP files. To read it programmatically, see example here:
A New Standard For Packaging Your Data
Packages and Parts
Using F#, it's the same story, you'll have to use classes in the System.IO.Packaging Namespace.

System.IO.File.ReadAllText has type of string -> string.
Because a .docx file is a binary file, it's probable that some of the chars in the strings have the bell character. Rather than ReadAllText, look into Word automation, the Packaging, or the OpenXML APIs

Try using the OpenXML SDK from Microsoft.
Also on the linked page is the Microsoft tool that you can use to decompile the office 2007 files. The decompiled code can be quite lengthy even for simple documents though so be warned. There is a big learning curve associated with OpenXML SDK. I'm finding it quite difficult to use.

Related

lines=True parameter for the Json Type Provider and Json.Net library?

I am working on this Kaggle competition. The Jupyter notebooks on Kaggle only support R and Python and I wanted to use F# locally. The problem is that the datasets are .json files and both the F# Json Type Provider and Newtonsoft libraries fail when trying to parse the files.
Here are examples of the code failing in F#:
open FSharp.Data
type Context = JsonProvider<"train.json">
let context = Context.
and
open System
open System.IO
open Newtonsoft.Json
open Newtonsoft.Json.Linq
let object = JObject.Parse(File.ReadAllText("train.json"));
object
This Python example uses these line of code to parse them correctly:
train = pd.read_json('../input/stanford-covid-vaccine/train.json', lines=True)
test = pd.read_json('../input/stanford-covid-vaccine/test.json', lines=True)
In the notebook, the author says that without the "lines=True" parameter, the read_json method fails with this trailing error.
My question: assuming tis is the same error, is there a way to apply that same kind of "lines=true" to the .NET libraries to parse the json?
I've seen a few datasets where the format was one valid JSON record per line:
{"event":"nothing 1"}
{"event":"nothing 2"}
{"event":"nothing 3"}
This is not valid JSON overall. I think you can either parse it line-by-line or you can turn it into valid JSON. For line-by-line parsing (which may be more efficient as you can do this in a streaming fashion), I would use:
open FSharp.Data
type Log = JsonProvider<"""{"event":"nothing 1"}""">
for line in File.ReadAllLines("some.json") do
let l = Log.Parse(line)
printfn "%s" l.Event

Meaning of CsvProvider error "The given key was not present in the dictionary" when trying to load sample file?

I am having trouble loading the csv files with FSharp.Data csv provider provided by fslab, including the sample adwords.csv file.
What does this error below mean? Also, when I hover over the code in the Visual studio editor it mentions that "The given key was not present in the dictionary"
Problem example:
#load "packages/FsLab/FsLab.fsx"
open System.IO
open FSharp.Data
"adwords.csv"
|> File.ReadAllLines
let test = CsvProvider<"adwords.csv">.GetSample()
The output:
>
val it : string [] =
[|"Criteria ID,Name,Canonical Name,Parent ID,Country Code,Target Type,Status";
"1000010,Abu Dhabi,"Abu Dhabi,Abu Dhabi,United Arab Emirates",9041082,AE,City,Active";
"1000011,Ajman,"Ajman,Ajman,United Arab Emirates",9047096,AE,City,Active";
"1000012,Al Ain,"Al Ain,Abu Dhabi,United Arab Emirates",9041082,AE,City,Active";
"1000013,Dubai,"Dubai,Dubai,United Arab Emirates",9041083,AE,City,Active";
"2004,Afghanistan,Afghanistan,,AF,Country,Active"|]
>
>System.MethodAccessException: Attempt by method '<StartupCode$FSI_0007>.$FSI_0007.main#()' to access method 'FSharp.Data.Runtime.CsvFile`1<System.__Canon>.Create(System.Func`3<System.Object,System.String[],System.__Canon>,
at <StartupCode$FSI_0007>.$FSI_0007.main#() in C:\test.fsx:line 11
Stopped due to error
I ran into this problem with my own files, so I grabbed this sample file from here: https://raw.githubusercontent.com/fsharp/FSharp.Data/master/tests/FSharp.Data.Tests/Data/Adwords.csv
Debug info:
If I delete the FSharp.Data library folder (v 2.3.0) and replace with version 2.2.5 it works correctly with no error.
If I don't use the FsLab.fsx script and instead use
#I "packages/FSharp.Data/lib/net40
#r "FSharp.Data.dll"
then everything works.
The path to the FsLab.fsx script is correct, it runs when I send the line to fsi.
The F# version is 14.0.23413.0.
The version of FSharp.Data downloaded by FSlab is FSharp.Data.2.3.0.
I have no other references in the .fsx script.
I am using Visual Studio Community edition 14.0.24720.00 Update 1.
.NET version 4.6.01038
I am realizing now that I am not getting the popup asking if I want to allow the .dll like I think I used to get when I used this before.
There is nothing wrong with the file. This for example works:
#load #"..\..\FSLAB\packages\FsLab\FsLab.fsx"
open System.IO
open FSharp.Data
[<Literal>]
let csvFile = #"C:\tmp\adwords.csv"
File.Exists csvFile
type Csv = CsvProvider<csvFile>
let csv = Csv.Load(csvFile)
csv.Rows
There is something wrong with your FsLab of FSharp.Data installation or type providers security maybe. Try the following, specify the path to the file directly. If it still doesn't work just nuget FSharp.Data and try using the csv type provider directly in a new project.
Other info is also helpful. VS version, FSLab version, wha other references you have. etc.
EDIT: Thanks for the debug info. That's actually quite helpful. VS2015 Update 1 broke two things, the Binding Redirect for Fsharp and the type providers (that might have been FSharp Tools, I forgot). I would upgrade to Update 2. If that's not possible please check if your FSharp.Data.TypeProviders.dll is in C:\Program Files (x86)\Reference Assemblies\Microsoft\FSharp\.NETFramework\v4.0\4.3.0.0\Type Providers.
As referencing the dlls directly works, it's probably a version mismatch issue. My FsLab install predates VS2015 Update 1 and 2, so will see if it behaves differently with a new download.
There is some issue with the installation of FSharp.Data currently bundled with FsLab (as of June 2016). This issue is with version 2.3.0. If you instead use FSharp.Data 2.2.5 the code works as expected.
Delete the packages/FSharp.Data folder and replace with version 2.2.5. I did it from an old installation but you could do it from Nuget

Exporting a list to OpenOffice Calc from Delphi

I'm using Delphi 7 and I'd like to export the contents of a list from my program to OpenOffice Calc using automation, instead of using files.
The task is simple: create new document, iterate through rows/columns and change cell data.
I've found some code but it's not complete, and I was hoping someone has some example code ready to accomplish this very simple task. It could save me a few hours of trying.
Thanks in advance!
Edit: I'd like to automate OpenOffice Calc to achieve what I wrote above. Thanks!
The easiest solution is to write CSV file output, and open that in OpenOffice.
There are also libraries to write .XLS files which both OpenOffice Calc and Excel can read. CSV is so simple, I wonder that you need an example. Create a TStringList, and add strings to it, in comma separated format. Save to file.
The so called "programmatic" method involves OLE automation.
uses
OleAuto;
var
mgr,calc,sheets,sheet1,dt,args:Variant;
begin
args = VarArrayCreate(...);
mgr := CreateOleObject('com.sun.star.ServiceManager');
dt := mgr.createInstance('com.sun.star.frame.Desktop')
calc = dt.loadComponentFromURL('private:factory/scalc', '_blank', 0, args)
sheets = calc.getSheets()
sheet1 = sheets.getByIndex(0)
...
Open Office supports Automation
see: http://udk.openoffice.org/common/man/tutorial/office_automation.html
Open Office info for Delphi can be found at:
http://development.openoffice.org/#OLE
The site ooomacros.org seems to be down, luckily the wayback machine still has a copy:
http://replay.web.archive.org/20090608051118/http://www.ooomacros.org/dev.php
Good luck.

Backslash read and write and F# interactive console

Edit: whats the difference between reading a backslash from a file and writing it to the interactive window vs writing directly the string to the interactive window ?
For example
let toto = "Adelaide Gu\u00e9nard"
toto;;
the interactive window prints "Adelaide Guénard".
Now if I save a txt file with the single line Adelaide Gu\u00e9nard . And read it in:
System.IO.File.ReadAllLines(#"test.txt")
The interactive window prints [|"Adelaide Gu\u00e9nard"|]
What is the difference between these 2 statements in terms of the interactive window printing ?
As far as I know, there is no library that would decode the F#/C# escaping of string for you, so you'll have to implement that functionality yourself. There was a similar question on how to do that in C# with a solution using regular expressions.
You can rewrite that to F# like this:
open System
open System.Globalization
open System.Text.RegularExpressions
let regex = new Regex (#"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase)
let line = "Adelaide Gu\\u00e9nard"
let line = regex.Replace(line, fun (m:Match) ->
(char (Int32.Parse(m.Groups.[1].Value, NumberStyles.HexNumber))).ToString())
(If you write "some\\u00e9etc" then you're creating string that contains the same thing as what you'd read from the text file - if you use single backslash, then the F# compiler interprets the escaping)
It uses the StructuredFormat stuff from the F# PowerPack. For your string, it's effectively doing printfn toto;;.
You can achieve the same behaviour in a text file as follows:
open System.IO;;
File.WriteAllText("toto.txt", toto);;
The default encoding used by File.WriteAllText is UTF-8. You should be able to open toto.txt in Notepad or Visual Studio and see the é correctly.
Edit: If wanted to write the content of test.txt to another file in the clean F# interactive print, how would i proceed ?
It looks like fsi is being too clever when printing the contents of test.txt. It's formatting it as a valid F# expression, complete with quotes, [| |] brackets, and a Unicode character escape. The string returned by File.ReadAllLines doesn't contain any of these things; it just contains the words Adelaide Guénard.
You should be able to take the array returned by File.ReadAllLines and pass it to File.WriteAllLines, without the contents being mangled.

How to write a simple .txt content processor in XNA?

I don't really understand how Content importer/processor works in XNA.
I need to read a text file (Content/levels/level1.txt) of the form:
x x
x x
x x
where x's are just integers, into an int[,] array.
Any tips on writting a SIMPLE .txt importer??? By searching google/msdn I only found .x/.fbx file importer examples. And they seem too complicated.
Do you actually need to process the text file? If not, then you can probably skip most of the content pipeline.
Something like:
string filename = "Content/TextFiles/sometext.txt";
string path = Path.Combine(StorageContainer.TitleLocation, filename);
string lineOfText;
StreamReader sr = new StreamReader(path);
while ((lineOfText = sr.ReadLine()) != null)
{
// do something
}
Also, be sure to set the "Build Action" to "None" and the "Copy to Output Directory" to "Copy if newer" on the text files you've added. This tells the content pipeline not to compile the text file but rather copy it to the output directory for use as is.
I got this (more or less) from the RacingGame sample provided by Microsoft. It foregoes much of the content pipeline and simply loads and processes text files (XML) for much of its level data.
XNA 4.0 uses
System.IO.Stream stream = TitleContainer.OpenStream("tilename.txt");
See http://msdn.microsoft.com/en-us/library/bb199094.aspx and also http://blogs.msdn.com/b/shawnhar/archive/2010/12/09/reading-files-in-xna-game-studio-4-0.aspx
There doesn't seem to be a lot of info out there, but this blog post does indicate how you can load .txt files through code using XNA.
Hopefully this can help you get the file into memory, from there it should be straightforward to parse it in any way you like.
XNA 3.0 - Reading Text Files on the Xbox
http://www.ziggyware.com/readarticle.php?article_id=69 is probably a good place to start. It covers creating a basic content processor.

Resources