I have a project that uses GeoDjango to store GPS routes. The geometry is stored in a GeometryField. This works great when data is imported with geospatial information, but it is frustrating when I have a model which needs user-supplied data. I would like to have a widget in the Admin that will let me upload a file, and then use that file to essentially import the geospatial information.
The FileField field doesn't seem appropriate, since I don't want the file stored on the file system. I want it processed and stored in the geospatial DB field so I can run geospatial functions on the data.
Ideally the admin interface would contain a file upload widget and the geospatial field, shown with the typical map.
There are a couple of options for importing geo data files into DB.
If you want to use a zipped shapefile, Geodjango comes with a nice solution, LayerMapping.
Before importing the file, you should implement the workflow for uploading zip file with a form, checking the required extensions ([".shp", ".shx", ".dbf", ".prj"]) and saving the files for reading.
Then you have to define a mapping to match field names across the file and Django model.
After you completed these steps, you can save the geometries into the DB with:
from django.contrib.gis.utils.layermapping import LayerMapping
layer = uploaded_and_extracted_file
mapping = {"id": "district", "name": "dis_name", "area": "shape_area", "geom": "MULTIPOLYGON"}
lm = LayerMapping(ModelName, layer, mapping, transform=True, encoding="utf-8")
lm.save(verbose=True, strict=True, silent=True)
Related
I am new to Snowflake, but my company has been using it successfully.
Parquet files are currently being written with an existing Avro Schema, using Java parquet-avro v1.10.1.
I have been updating the dependencies in order to use latest Avro, and part of that bumped Parquet to 1.11.0.
The Avro Schema is unchanged. However when using the COPY INTO Snowflake command, I receive a LOAD FAILED with error: Error parsing the parquet file: Logical type Null can not be applied to group node but no other error details :(
The problem is that there are no null columns in the files.
I've cut the Avro schema down, and found that the presence of a MAP type in the Avro schema is causing the issue.
The field is
{
"name": "FeatureAmounts",
"type": {
"type": "map",
"values": "records.MoneyDecimal"
}
}
An example of the Parquet schema using parquet-tools.
message record.ResponseRecord {
required binary GroupId (STRING);
required int64 EntryTime (TIMESTAMP(MILLIS,true));
required int64 HandlingDuration;
required binary Id (STRING);
optional binary ResponseId (STRING);
required binary RequestId (STRING);
optional fixed_len_byte_array(12) CostInUSD (DECIMAL(28,15));
required group FeatureAmounts (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key (STRING);
required fixed_len_byte_array(12) value (DECIMAL(28,15));
}
}
}
The 2 files I have, written in parquet 1.10.1 and 1.11.0 output this identical schema.
I have also tried with a bigger schema example, and it appears everything works fine if there is no "map" avro type present in the schema. I have other massive files with huge schemas, many union types that convert to groups in parquet, but all are written and read successfully when they don't contain any "map" types.
But as soon as I add back the "map" type then I get that weird error message from Snowflake when trying to ingest the 1.11.0 version (however 1.10.1 version will load successfully). But parquet-tools with 1.11.0, 1.10.1 etc can still read the files.
I understand that from this comment that there are changes to the Logical Types in Parquet 1.11.0, but that it is supposed to be compatibile still for old versions to read.
But does anyone know what version of Parquet is used by Snowflake to parse these files? Is there something else that could be going on here?
Appreciate any assistance
Logical type Null can not be applied to group node
Looking up the error above, it appears that a version of Apache Arrow's parquet libraries is being used to read the file.
However, looking closer, the real problem lies in the use of legacy types within the Avro based Parquet Writer implementation (the following assumes Java was used to write the files).
The new logicalTypes schema metadata introduced in Parquet defines many types including a singular MAP type. Historically, the former convertedTypes schema field supported use of MAP AND MAP_KEY_VALUE for legacy readers. The new writers that use logicalTypes (1.11.0+) should not be using the legacy map type anymore, but work hasn't been done yet to update the Avro to Parquet schema conversions to drop the MAP_KEY_VALUE types entirely.
As a result, the schema field for MAP_KEY_VALUE gets written out with an UNKNOWN value of logicalType, which trips up Arrow's implementation that only understands logicalType values of MAP and LIST (understandably).
Consider logging this as a bug against the Apache Parquet project to update their Avro writers to stop nesting the legacy MAP_KEY_VALUE type when transforming an Avro schema to a Parquet one. It should've ideally been done as part of PARQUET-1410.
Unfortunately this is hard-coded behaviour and there are no configuration options that influence map-types that can aid in producing a correct file for Apache Arrow (and for Snowflake by extension). You'll need to use an older version of the writer until a proper fix is released by the Apache Parquet developers.
I want to perform a text generation task in a flask app and host it on a web server however when downloading the GPT models the elastic beanstalk managed EC2 instance crashes because the download takes too much time and memory
from transformers.tokenization_openai import OpenAIGPTTokenizer
from transformers.modeling_tf_openai import TFOpenAIGPTLMHeadModel
model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
These are the lines in question causing the issue. GPT is approx 445 MB. I am using the transformers library. Instead of downloading the model at this line I was wondering if I could pickle the model and then bundle it as part of the repository. Is that possible with this library? Otherwise how can I preload this model to avoid the issues I am having?
Approach 1:
Search for the model here: https://huggingface.co/models
Download the model from this link:
pytorch-model: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-pytorch_model.bin
tensorflow-model: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-tf_model.h5
The config file: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-config.json
Source: https://huggingface.co/transformers/_modules/transformers/configuration_openai.html#OpenAIGPTConfig
You can manually download the model (in your case TensorFlow model .h5 and the config.json file), put it in a folder (let's say model) in the repository. (you can try compressing the model, and then decompressing once it's in the ec2 instance if needed)
Then, you can directly load the model in your web server from the path instead of downloading (model folder which contains the .h5 and config.json):
model = TFOpenAIGPTLMHeadModel.from_pretrained("model")
# model folder contains .h5 and config.json
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
# this is a light download
Approach 2:
Instead of using links to download, you can download the model in your local machine using the conventional method.
from transformers.tokenization_openai import OpenAIGPTTokenizer
from transformers.modeling_tf_openai import TFOpenAIGPTLMHeadModel
model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
This downloads the model. Now you can save the weights in a folder using save_pretrained function.
model.save_pretrained('/content/') # saving inside content folder
Now, the content folder should contain a .h5 file and a config.json.
Just upload them to the repository and load from that.
Open https://huggingface.co/models and search the model you want. Click on the model name and finnaly click on "List all files in model". You will get a list of the files you can download.
ADTF dat file contains streams of data. In the .dat file there is only a stream name. To find the structure of the stream one has to go through DDL .description file.
Sometimes the .description files are incomplete or are missing link from stream name to corresponding structure.
Is there some additional information about structure name hidden in the .dat file itself? (Or my understanding is completely wrong?)
You must differ between ADTF 2.x and ADTF 3.x and their (adtf)dat file structure.
ADTF 2.x:
You are right, you can only interpret data with ddl. The stream must point to a structure described in Media Description.
Sometimes the .description files are incomplete or are missing link
from stream name to corresponding structure.
You can avoid this by enable the Option Create Media Description in Harddisk Recorder. Then a *.dat.description will be stored next to the same-titled *.dat file, which contains the correct stream and structure reference, because it was available during recording.
Is there some additional information about structure name hidden in the .dat file itself?
No, it is only the stream name. So you need to know the data structure behind to interpret. If you have the header (c-struct), you can also convert to ddl and refer to that.
ADTF 3.x:
To avoid these problems for not available or incorrect description files, the DDL is now stored in the *.adtfdat file in ADTF 3.x
I am looking for some help on how to create a trailer record in a flat file using SSIS, I have create a SSIS package that creates a custom header and loads other record from the database into the flat file, it is a fixed width flat file. Now at the end of the file I want to create a Trailer Record along with some static text and Record count. I tried looking on to google but could not get any good example. Any help is much appreciated.
Use a Script Task. Take the file path of the fixed width flat file that you have obtained as a input variable. Once withing the script task use the .Net coding to append the data that you need. I have written a post on it - https://karbimantras.wordpress.com/2016/08/12/adding-record-count-to-flat-file/
I'm building an ASP.Net MVC4 application and the customer wants to be able to supply an XML configuration file, to configure a vendor list in the application, something like this:
<Vendor>
<Vendor name="ABC Computers" deliveryDays="10"/>
<Vendor name="XYZ Computers" deliveryDays="15"/>
</Vendors>
The file needs to be dropped onto a network location (i.e. not on the web server) and I don't have a database to import and store the data.
The customer also wants the ability to update it daily. So I'm thinking I'll have to do some kind of import (and validate the file) when the application starts up.
Any good ideas on the best way to accomplish this?
- The data needs to be quickly accessible
- Ideally I just want to import/store it once, or be able to access it quickly
- I need to be able to validate the file, so it might be prudent to be able to be able to switch to a backup
One thought was to use something like Entity Framework and simply read the file whenever I needed it, but if possible I'd hold it in memory in the application if possible.
Cheers
Vincent
No need to import it into a database or use Entity Framework. You can simply use .NET Xml Serialization to accomplish this.
The command line tool xsd.exe will generate c# classes from your Xml file. From the command line:
xsd.exe myfile.xml
xsd.exe /c myfile.xsd
The first command will infer and create an xml schema file (myfile.xsd) from your xml. The second command will convert the schema file to c# classes.
Then use the XmlSerializer class to deserialize your xml file into objects (assuming multiple objects in one file):
MyCollection myObjects= null;
string path = "mydata.xml";
XmlSerializer serializer = new XmlSerializer(typeof(MyCollection));
StreamReader reader = new StreamReader(path);
myObjects = (MyCollection)serializer.Deserialize(reader);
reader.Close();
You can use the .xsd file generated above to validate your xml files. Here's a link showing how: http://msdn.microsoft.com/en-us/library/ms162371.aspx.