I'm saving TADO recordset to XML
and trying to load it to TClientDataSet but geting error about wrong XML format.
How can i transform ADO XML recordset format to TClientDataset format.
thanks.
The XML format used by the TClientDataSet is not the same used by ADO, in order to transform the XML ADO format in a valid XML file used by TClientDataset you have two options.
1) Use a XSLT Transformation, for this you need the a XSL Style Sheet and/or a XSD schema from XML , here you find some hints about the XSD.
2) Use one of the ado components to read the XML ADO file, then iterate over the records and finally populate the TClientDataSet.
Related
I see plenty of examples on how to convert Avro files to Parquet, with Parquet retaining the Avro schema in its metadata.
I'm confused however on if there's some simple way of doing the opposite - converting Parquet to Avro. Any examples of that?
I think with Impala with some query like that should work :
CREATE TABLE parquet_table(....) STORED AS PARQUET;
CREATE TABLE avro_table(....) STORED AS AVRO;
INSERT INTO avro_table SELECT * FROM parquet_table;
Parquet data stored in parquet_table will be converted into avro format into avro_table.
I have a few TB logs data in JSON format, I want to convert them into Parquet format to gain better performance in analytics stage.
I've managed to do this by writing a mapreduce java job which uses parquet-mr and parquet-avro.
The only thing I'm not satisfied with is that, my JSON logs doesn't have a fixed schema, I don't know all the fields' names and types. Besides, even I know all the fields' names and types, my schema evolves as time goes on, for example, there will be new fields added in future.
For now I have to provide a Avro schema for AvroWriteSupport, and avro only allows fixed number of fields.
Is there a better way to store arbitrary fields in Parquet, just like JSON?
One thing for sure is that Parquet needs a Avro schema in advance. We'll focus on how to get the schema.
Use SparkSQL to convert JSON files to Parquet files.
SparkSQL can infer a schema automatically from data, thus we don't need to provide a schema by ourselves. Every time the data changes, SparkSQL will infer out a different schema.
Maintain an Avro schema manually.
If you don't use Spark but only Hadoop, you need to infer the schema manually. First write a mapreduce job to scan all JSON files and get all fields, after you know all fields you can write an Avro schema. Use this schema to convert JSON files to Parquet files.
There will be new unknown fields in future, every time there are new fields, add them to the Avro schema. So basically we're doing SparkSQL's job manually.
Use Apache Drill!
From https://drill.apache.org/docs/parquet-format/, in 1 line of SQL.
After setup Apache Drill (with or without HDFS), execute sqline.sh to run SQL queries:
// Set default format ALTER SESSION SET `store.format` = 'parquet';
ALTER SYSTEM SET `store.format` = 'parquet';
// Migrate data
CREATE TABLE dfs.tmp.sampleparquet AS (SELECT trans_id, cast(`date` AS date) transdate, cast(`time` AS time) transtime, cast(amount AS double) amountm, user_info, marketing_info, trans_info FROM dfs.`/Users/drilluser/sample.json`);
Should take a few time, maybe hours, but at the end, you have light and cool parquet files ;-)
In my test, query a parquet file is x4 faster than JSON and ask less ressources.
I tried to add instances in ontology using WebProtege. But the problem is that data is not assigned as data/object property instead it is in the 'type' the heading under the 'Description'. Is there any other quick way to add indvisuals from CSV file.
Steps:
Using Jena to read the ontology into Model X
Write small java code that will read each row of CSV file and convert it according the ontology vocabulary in the form of RDF statement.
RDF Statements can then be stored into the same Model X.
So at the end you will have both your ontology and data instances in the same model X. Then print your model X into some file with extension of saved file as "RDF/XML".
If using the Protégé Ontology editor environment you can use the plugin called "Cellfie" A Protégé Desktop plugin for mapping spreadsheets to OWL ontologies. The plugin is available in the menu item "Tools > Create axioms from Excel workbook..."
Lemme explain my problem.
I have One ID and 2 Other Fields in a CsV file. the ID connected to a database table.
I have to show the curresponding entries in the db and fields from Csv. Need sort the Fields too.
My Idea was load into a ClientDataset, lookup to a Query with table and Use sort and show.
My Csv have 85 K Records and its taking 120 seconds to load and sort, Its not acceptable. Can you tell me, can I use Bacthmove for this. So I can easily pick fields by a simple query. if I can use Batchmove please give me the guidelines.
Also Is there any other Techniques for this?
Thanks and Regards,
Vijesh V.Nair
May be you can take a look in global temporary table but I don't think it will be quicker
i often convert database from csv file to firebird server too.
usually i do like these
- use ms excel to read this csv file first to see if this file is not corrupt.
- still using excel, save this csv file to xls format, close ms excel after converting.
- using axolot component (xlsreadwrite) (www.axolot.com) , try to read cell by cell and insert them to memory table.
- using fibplus component (www.devrace.com), insert them to firebird server.
- job done, going home.
In this case, of course you must prepare your firebird server too. After converting finished, everything is easy actually. you can use uib component (www.progdigy.com) too to connect to firebird server. do not use ibexpress component (ibx) cause of it's author EXPLICITLY state that ibx is never intended for firebird.
I'm trying to convert an XML document into a dataset that I can import into a database (like SQLite or MySQL) that I can query from.
It's an XML file that holds most of the stuff in attributes. This is part of a Rails project so I'm very inclined to use Ruby (and that's the language I'm most comfortable with at the moment).
I'm not sure how to go about doing that and I'd welcome both high-level and low-level contributions.
xmlsimple can convert your xml into a Ruby object (or nested object) which you can then look over and do whatever you like with. Makes working XML in Ruby really easy. As Jim says though depends on your XML complexity and your needs.
There are three basic approaches:
Use ruby's xml stream parsing facilities to process the data with ruby code and write the appropriate rows to the database.
Transform the xml using xslt to a non-XML stream format and feed that into a ruby program that updates the database
Transform the xml with xslt into a format acceptable to the bulk-loading tool for whatever database you are using.
Only you can determine the best approach depending on the XML schema complexity and the type of mapping you have to perform to get it into relational format.
It might help if you could post a sample of the XML and the DB schema you have to populate.
Will it load model data? If you're on *nix take a look at libxml-ruby. =)
With it you can load the XML, and iteration through the nodes you can create your AR objects.
You can have a look at the XMLMapping gem. It lets you define different classes depending upon the structure of your XML. Now you can create objects from those classes.
Now you will have to write some module which actually converts these XMLMapping objects into ActiveRecord objects. Once those are converted to AR objects you can simply call save to save those objects into the corresponding tables.
It is a long solution but it will let you create objects out of your XML without iterating over it. XMLMapping will do it for you.
Have you considered loading the data into an XML database?
Without knowing what the structure of the data is, I have no idea what the benefits of an RDBMS over an XML DB are.