How to read table rows and combine them with specific file content, which compressed in gz file by apache beam dataflow? - google-cloud-dataflow

I have a table that store metadata in bigquery:
id, domain, file_path
1, domain_1.com, gs://path_to_zip_1.tar.gz
2, domain_2.com, gs://path_to_zip_1.tar.gz
3, domain_3.com, gs://path_to_zip_1.tar.gz
4, domain_4.com, gs://path_to_zip_2.tar.gz
5, domain_5.com, gs://path_to_zip_2.tar.gz
6, domain_6.com, gs://path_to_zip_2.tar.gz
The table store rows of domains and gz file path to GCS.
The gz file contains a set of domains txt files.
The domain.txt file contains json.
for example:
path_to_zip_1.tar.gz contains domain_1.txt, domain_2.txt, domain_3.txt
path_to_zip_2.tar.gz contains domain_4.txt, domain_5.txt, domain_6.txt
domain_#.txt: {"key":"value","key":"value","key":"value",....}
I am using apache beam for my flow.
How can I read table rows and combine them with the content of domain_#.txt?

Related

Neo4j LOAD CSV Error: unknown protocol: c

LOAD CSV FROM "file:/C:/Users/abcd/Desktop/Neo4J/fileName.csv" AS row
WITH row
RETURN row
This is my code for importing this csv to my database
but it is giving me error as
Neo.ClientError.Statement.ExternalResourceFailed: Invalid URL
'C:/Users/abcd/Desktop/Neo4J/fileName.csv': unknown protocol: c
can anyone help me solve this
Local CSV files are accessible using file:/// URL.
file:/// URLs identify files on the filesystem of the database server
You need to add file as protocol before the local files address, as follows:
LOAD CSV FROM "file:///C:/Users/abcd/Desktop/Neo4J/fileName.csv" AS row
WITH row
RETURN row
NOTE:
You need to change neo4j.conf file for allowing CSV import from
file URLs.
Uncomment this line(remove #):
#dbms.security.allow_csv_import_from_file_urls=true
Comment this line(Add # in the start):
dbms.directories.import=import
Don't forget to restart Neo4j after these changes.
try below line, use some extra slashes
LOAD CSV FROM "file:///C:/Users/abcd/Desktop/Neo4J/fileName.csv" AS row
WITH row
RETURN row

Neo4j Importing local CSV File

I'm trying to import a local csv file but I have got InvalidSyntax Error.
LOAD CSV WITH HEADERS FROM file:C:/csv/user.csv
Invalid input '/' (line 1, column 35 (offset: 34))
"LOAD CSV WITH HEADERS FROM file:C:/csv/user.csv"
You need to put the filename in quotes, and add a few more slashes:
LOAD CSV WITH HEADERS FROM "file:///C:/csv/user.csv"
Full documentation here.
The command below will return the first 5 lines of your CSV file:
LOAD CSV WITH HEADERS FROM "file:///<PATH_TO_YOUR_CSV_FILE>" AS line WITH line RETURN line LIMIT 5;
But you'll have to follow some steps to align with Neo4J security restrictions.
1) Find the conf folder in the neo4j server folder.
Open the neo4j.conf with a text editor.
2) Uncomment the line containing:
#dbms.security.allow_csv_import_from_file_urls=true
To uncomment it, just remove #. It should be like this:
dbms.security.allow_csv_import_from_file_urls=true
3) Comment this line below:
dbms.directories.import=import
To comment it, add #. It should be like this:
#dbms.directories.import=import
Further on importing from CSV in neo4j documentation here: https://neo4j.com/blog/importing-data-neo4j-via-csv/
LOAD CSV WITH HEADERS FROM "file:C:/path/location/filename.csv" AS row
Found that these query asks Neo4j to look in a specific location
C:\Users\*******\.Neo4jDesktop\neo4jDatabases\database-2b9d81ff-1976-427e-ba98-4f3191c3ef62\installation-3.4.9\import
placing your csv here and using the query
LOAD CSV WITH HEADERS FROM "file:///testData2.csv" AS line
solved the issue for me
or you can change the settings by making changes here
dbms.directories.import=import
NB: I am using windows 10 , neo4j-desktop-offline-1.1.12
I had the same problem (in Windows 10) and I realized that I was just trying to load the CSV file without saying to it to return something.
For me it worked pretty well like this:
LOAD CSV WITH HEADERS FROM "file:///C:all_data.csv" AS line
RETURN line
Note: Do not forget to place the CSV file that you want to import on the neo4j import file!

Loading whole file from source into HDFS in flume

How to get source filename as it is from source into HDFS in flume?
Ex: source file /usr/sample.txt hdfs: /tmp/sample.txt not like flumeevetns.23343.tmp
how to stop appending timestamp and .tmp?Ex:flumeevent.12334343.tmp(Here 12334343.tmp) I dont want it.
How to read as a whole file from Flume?
How to read csv file in Flume?
You need to add a parameter for spooldir which adds a header which is false by
default.
agentname.sources.sourcename.fileHeader=true
It will keep the same name of file and push into HDFS.

Neo4j - Syntax for Loading CSV with Headers

I'm just getting started with Neo4j, and have been trying to create my first project in Neo4j Community with a small sample data from a CSV. I keep getting an invalid input/syntax error (see image below).
The problem could be several places:
I may not have set up my project correctly
I may not have the file in the right place
I may not be using the syntax correctly
Here is the Cypher I've been using to try to load the file:
LOAD CSV WITH HEADERS FROM 'C:\Users\Diana\Documents\Nattosphere\Natto_Sample.csv' AS line
CREATE (n: Natto_Variety{Product_UID: line.Product_UID, Product_Manufacturer: line.Product_Manufacturer, Product_Weight_g: line.Product_Weight_g, Product_Flavoring: line.Product_Flavoring})
I've tried several approaches, and created a simplified file, but am getting the same error each time:
Invalid input 's': expected org$neo4j$cypher$internal$compiler$v2_2$parser$Strings$$HexDigit (line 1, column 33 (offset: 32))
"LOAD CSV WITH HEADERS FROM 'C:\Users\Diana\Documents\Nattosphere\Natto_Sample.csv' AS line"
At the bottom of the GUI, another error reads:
"Neo.ClientError.Statement.InvalidSyntax"
Any idea what might be happening here?
-D
I think you have to do:
LOAD CSV WITH HEADERS FROM 'file:C:/Users/Diana/Documents/Nattosphere/Natto_Sample.csv'
If that doesn't work try:
LOAD CSV WITH HEADERS FROM 'file://C:\Users\Diana\Documents\Nattosphere\Natto_Sample.csv'
See: http://neo4j.com/developer/guide-import-csv/
Frist, you have to set true in setting dbms.security.allow_csv_import_from_file_urls
Second, you must to set where neo4J will search the csv in dbms.directories.import
When you setting those items, you have to copy you csv file into the folder where we set the dbms.directories.import
Later in Cypher:
LOAD CSV FROM 'file:///Natto_Sample.csv'
*if you hace folder into folder... use this character / to route your URL
The Syntaxe to load the csv with headers as described In the CSV Import Guide is :
LOAD CSV WITH HEADERS FROM "file-url" AS line
where the file-url , for local files, is :
file:///data.csv
Put the csv file in the import directory,and that should works
You don't need to write the C: reference:
LOAD CSV WITH HEADERS FROM 'file:/Users/Diana/Documents/Nattosphere/Natto_Sample.csv'

DXGI_FORMAT in dds file

I am parsing DDS file to read its header data. I want to modify format of image, but it seems that header mentioned at this site does not specify where DXGI_FORMAT (internal format) is stored. Where I can I get internal format in file ?
Like DXGI_FORMAT_BC1_UNORM value is 71, but i did not find it in header

Resources