Does GreenPlum with PXF support avro data with schema evolution - avro

We have user data (avro files) validated and ingested into HDFS using Schema Registry(data keep on evolving) and using GreenPlum with PXF to access HDFS data. Created one external table and trying to query the HDFS data but getting error as,
warehouse=# select * from user;
ERROR: Record has 151 fields but the schema size is 152 (seg1 slice1 192.168.1.17:6001 pid=6582)
CONTEXT: External table user
warehouse=#
user HDFS files are ingested using different schema versions, and GreenPlum external table has been created with fields from all the schema versions.

Related

Azure Data Factory read url values from csv and copy to sql database

I am quite new in ADF so thats why i am asking you for any suggestions.
The use case:
I have a csv file which contains unique id and url's (see image below). i would like to use this file in order to export the value from various url's. In the second image you can see a example of the data from a url.
So in the current situation i am using each url and insert this manually as a source from the ADF Copy Activity task to export the data to a SQL DB. This is very time consuming method.
How can i create a ADF pipeline to use the csv file as a source, and that a copy activity use each row of the url and copy the data to Azure SQL DB? Do i need to add GetMetaData activity for example? so how?
Many thanks.
use a look up activity that reads all the data,Then use a foreach loop which reads line by line.Inside foreach use a copy activity where u can able to copy response to the sink.
In order to copy XML response of URL, we can use HTTP linked service with XML dataset. As #BeingReal said, Lookup activity should be used to refer the table which contains all the URLs and inside for each activity, Give the copy activity with HTTP as source and sink as per the requirement. I tried to repro the same in my environment. Below are the steps.
Lookup table with 3 URLs are taken as in below image.
For-each activity is added in sequence with Lookup activity.
Inside For-each, Copy activity is added. Source is given as HTTP linked service.
In HTTP linked service, base URL, #item().name is given. name is the column that stored URLs in the lookup table. Replace the name with the column name that you gave in lookup table.
In Sink, azure database is given. (Any sink of your requirement is to be used). Data is copied to SQL database.
this is the dataset HTTP inside the copy activity
This is the input of the Copy Activity inside the for each
this is the output of the Copy Activity
My sink is A Azure SQL Database without any tables yet. i would like to create auto table on the fly from ADF. Dont understand why this error came up

how to extract document size from FileNet oracle database

I am new to FileNet. Is it possible for me to write a SQL statement to retrieve the file size of the document from FileNet oracle database ? What will be the field name be?

Does AVRO supports schema evolution?

I am trying to understand whether AVRO supports schema evolution for the following case.
Kafka Producer writing using schema1
Then again producer writing using schema2 - A new field added with default value
Kafka Consumer consuming above both message using schema1?
I am able to read the first message successfully from Kafka but for the second message I am getting ArrayIndexOutOfBoundException. Ie - I am reading the second message (written using schema2) using schema1. Is this expected not to work? Is it expected to update the consumer first always?
Other option is to use schema registry but I don't want to opt this. So I would like know whether schema evolution for above case is possible?
When reading Avro data, you always need two schemata: the writer schema and the reader schema (they may be the same).
I'm assuming you're writing the data to Kafka using the BinaryMessageEncoder. This adds a 10-byte header describing the write schema.
To read the message (using the BinaryMessageDecoder), you'll need to give it the read schema (schema1) and a SchemaStore. This latter can be connected to a schema registry, but it need not. You can also use the SchemaStore.Cache implementation and add schema2 to it.
When reading the data, the BinaryMessageDecoder first reads the header, resolves the writer schema, and then reads the data as schema1 data.

TFS2010: FactTestResult table in Tfs_Warehouse is missing data

We are running some reports against TFS 2010, in particular on the unit tests that ran against a particular build.
These reports after a certain date started returning no data. My investigation shows that there is no data in FactTestResult table after a certain date, while other tables, for example DimTestRun, have the data associated with the same test runs.
Of these two queries only the first one returns data:
SELECT * FROM FactTestResult WHERE TestRunSK = 58959
SELECT * FROM DimTestRun WHERE TestRunSK = 58959
But for an earlier TestRunSK both queries return data:
SELECT * FROM FactTestResult WHERE TestRunSK = 56582
SELECT * FROM DimTestRun WHERE TestRunSK = 56582
Any ideas on why the data is being lost for the FactTestResult table and if it can be fixed?
Try to go to the Warehouse Control Web Service, and check the Processing Status, then manually process the data warehouse relational database by following article Manually Process the Data Warehouse and Analysis Services Cube for Team Foundation Server.
To access the Warehouse Control Web Service:
Log on to the application-tier server.
Open Internet Explorer, type the following string in the Address bar, and then press ENTER:
-
http://localhost:8080/VirtualDirectory/TeamFoundation/Administration/v3.0/WarehouseControlService.asmx
If manually process Data Warehouse doesn't work, try to rebuild the data warehouse by following article Rebuild the Data Warehouse and Analysis Services Cube.

Map columns from excel to db in rails engine

I am working on a rails engine which takes a excel file as an input and than saves it to temp folder in app. Than when i click on validate it validates the file and if there are no errors than it will save file to database. I am able to get the columns of excel file but my problem is that a excel file can have different columns than what in my excel file.
Also i am not generating any migration for engine. db table will be provided by the app and my engine will only parse data, validate and than if excel file is valid than it will save all the data in table.
I am getting confused how to set mapping such that it will save data to db.
For e.g. - My engine have model upload.rb where i am doing all validation for excel file and than saving it. Now suppose my app have a model employee.rb which have columns first name, last name, employee id etc. while my excel file can have any alias or different column name for data. e.g. fname, lname etc.
Suppose i have validation in my engine model, so my engine have no idea about mapping, it will only validate and save the data in db. So mapping needs to be set in my app.
Can anyone tell me how can i set mapping so that whenever my engine is mounted it will parse excel and save to to app's db table as per the mapping set ?
Any help will be appreciated.

Resources