I am working in rails and I downloaded a word document from OneDrive through graph API and it returns a binary string which is a collection of files. I need to convert this string into .docx file and if I save it in a simple way or I write as a binary file after decoding it using base64, it doesn't save in the right format, it looks some awkward content in the file.
Any help in this regard will be appreciated.
Thanks
Can you not just save the binary string to a file?
data = <binary string>
File.open('document.docx', 'wb') do |f|
f.write(data)
end
A docx file is actually a gzipped collection of files, with the file extension .docx substituted for .gz. There should be no conversion necessary, and there should be no encoding necessary in order to download it across the 'net.
You should be able to change the file extension to .gz and then unzip it using gunzip, with the result being a collection of xml files (text) and directories. If you can't do this, then you haven't correctly decoded it, so you should figure out what encoding you have requested, and reverse that, or better, don't request encoding at all.
Related
How can i parse text from docx file?
I already tried Data(contentsOf:) and String(contentsOf:) but nothing worked.
This can't be done using Data(contentsOf:) or String(contentsOf:) because .docx format is a zipped format consists of xml and other files. In order to parse the text from the .docx file, you should unzip the doc file. In my case, I used ZIPFoundation to unzip the document. Parse the file named word/document.xml under the extract path using any XML Parser and you will be able to get the text from the document.
Sources:
Converting Docx Files To Text In Swift
Reading or Converting word .doc files iOS
how can I show the chinese,thanks.
This is my csv file.
How can solve it,thank you.
LOAD CSV requires that the CSV file use UTF-8 character encoding. Your file may be using the wrong encoding.
somefilename = somefilename0
File.open(somefilename0) do |in_file|
somefilename = somefilename.sub! '.txt', '.csv'
File.open(somefilename, 'w') do |out_file|
in_file.each {|line| out_file << line.gsub('\t', ',') }
end
end
I am trying to convert txt to csv and am using ruby. My code works, it converts the txt to csv. I have an original csv file that I converted to the txt file in order to play around with it. So, when I open the original csv file in a notepad text editor, and when I open the converted csv file (which was converted from txt but has the same data) in a notepad text editor, they look EXACTLY the same. There is literally no difference that I can see at all. It's just a small file, a few lines long.
However, when I open these files in excel, for some reason the converted csv file has an extra line between each original line. The original csv file has no such thing.
When I have them both open in a text editor though, they look EXACTLY THE SAME. What in the world is going on?
EDIT: Also, when I upload the txt file and save it, and convert it to a csv file, and then try to upload that csv file to a database, ruby says Unquoted fields do not allow \r or \n. However, if I just upload the original file to the database, it works just fine.
I have to develop an iOS application that can read the data from a CSV file hosted on a domain. Is there any standard APIs that can help me to do this? I don't need to download but just read the file because the file will be updated for every two mins.
I recommend Dave DeLong's CHCSVParser library for parsing.
You will have to download the file, that is the only way to get it from the remote host to your device. A CSV File is a text file with data separated by a comma(','). Download the file from the the remote host, read the file line by line, split the line string that was read from the file;
For example:
1,2,3,4,1,2,3 ...Line 1
Split using ',' as a delimiter and add the split values into an array, the result will be:
array_line_one = {1,2,3,4,1,2,3};
I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1) and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
You can supply source encoding straight in the file mode parameter:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csv extension manually, and paste the contents of your file there, then parse the contents like usual.
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
add second argument "r:ISO-8859-1" as File.open("input_file","r:ISO-8859-1" )
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
Then I came across this gem
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!