how to handle the data encoding issue while copying the data from CSV file to parquet using Azure copy activity? - character-encoding

I have a CSV file that I wanted to convert to the parquet the CSV file contains the value Querý in one column
So I am using use copy activity from the azure data factory and converting to the parquet but I get the value as Queryý. I don't find any enoding option in the sink. I have seen a few documentation but everything says about the CSV file ending. Could someone help with this?

There is no way to set the encoding of parquet in Azure Data Factory.
I created a pipeline to test and it can work fine.
Here are some advice for you to troubleshoot:
Make sure the encoding of your csv file is correct.
Make sure your schema of Parquet is correct.

Related

Avoid reading the same file multiple times using Telegraf and file input plugin

I need to read csv files inside a folder. New csv files are generated every time a user submits a form. I'm using the "file" input plugin to read the data and send it to Influxdb. These steps are working fine.
The problem is that the same file is read multiple times every data collection interval. I was thinking of a solution where I could move the file that was read to a different folder, but I couldn't do that with Telegraf's "exec" output plug.
ps: I can't change the way csv files are generated.
Any ideas on how to avoid reading the same csv file multiple times?
As you discovered file input plugin is used to read entire files at each collection interval.
My suggestion is for you to instead use the directory monitor input plugin. This will read files in a directory, monitor the directory for new files, and parse the ones that have not already been picked up yet. There are some configuration settings in that plugin that make it easier to time when new files are read as well.
Another option is to use the tail input plugin which will tail a file and only read new updates to that file as things come. However, I think the directory monitor is more likely something you are after for your scenario.
Thanks!

How to export data as a XES file in rails?

I am working on a application that allows users upload and edit CSV files containing process activity logs.
I am not storing the data in the database.
So far the user can download the data as a new .csv file.
Now I want to convert that data into a XES file (this a special XML format used mainly for process mining in tools such as ProM framework).
I couldn't find any hint on how to do that.
I wonder if there's a rails gem or something.
Thanks in advance
ps.: here is the project code.
A simple method would be imported in Disco and by using this tool, exported in XES format ( I always use this method).
If you don't access to Disco another solution would be used ProM software. There is a plugin for this platform for converting CSV file to XES format ( developed by F. Mannhardt).

writing source and custom sink using flume-ng

I am new to flume-NG. I want my source to send some unique xml files to the channel one by one. The channel will validate the xml files and send the validity(either true or false) and the xml file to th custom sink. This sink will write the valid files and invalid files to different directories in HDFS. I am not sure which source to use. Please help.
None of the current ones will fit your use case. The SpoolingDirectorySource is line-oriented so XML files will confuse it and not arrive in one piece.
I suggest you write a custom source for your application.

how to convert xls to csv using IOS library?

I need to read xls files in my IOS app. First of all, I want to convert xls files to csv format files, then my app parse csv files, but I can't find any ios library to convert xls to csv, please help me
If you have a .xls file, you can use the open source DHlibxls library to read the file into your app. This is an ObjectiveC framework that wraps a C-based library. The library is quite mature.
ios or any objecive-framework doesn't provide any thing for accesseing Microsoft's xls :(
To convert-xls to/fro csv is itself a project in it!!!
On top of this, there are different format of xls, now xlsx files. And writing a xls and reading it back in proper way is tooooo-cumbursome task to accomplish. However we have managed to read it but it is not 100% efficient :(
I guess in near future you may want to move to xlsx file then your task will be a lot more difficult. You can check yourself, change the file name extension to .zip and unzip you will see many files, one having row numbers, another columns, third with links, fourth with contents and so on. Mapping and getting in correct form in not impossible but needs a lot of work.
There can be many other ways to do, I can suggest to use java api to do, or even save you xls to csv directory from excel, then your work will be easy.

Reading Excel files with roo /rails

I am using the rails gem called roo to read and parse uploaded Excel and CSV files.
I understand that in roo, the way it reads an Excel file is Excel.new("myfilename"). I am facing issue because I have to read the file uploaded with form helper (upload helper), temp file. I am saving the temp file before reading it with roo/Excel.
Though I am uplaoding good excel files, I am getting
the file is not an Excel/xlsx
error.
Is there a way to directly read from Uploaded IO?
Can you guys tell me what am I doing wrong here?
Thanks!
If you are developing on a Windows box, when you open files, you have to add a 'b' (binary) to the file mode, i.e:
File.open("spreadsheet.xls","rb")
for read only, binary.
Not sure if that's your problem, but I faced a similar problem and that was the solution.
good luck
I am not familiar with roo, but I have used http://rubygems.org/gems/parseexcel
workbook = Spreadsheet::ParseExcel.parse("#{Dir.getwd}/public/excel/foo.xls")

Resources