Impala create external table, stored by Hive

Impala create external table, stored by Hive - twitter

I am trying to figure out since yesterday why my table creation is not working. Since I can't link my Impala to my Hbase I can't make queries on my twitter stream :/
Do I need a special JAR like Hive for the SerDe properties ?
Here is my command:
CREATE EXTERNAL TABLE HB_IMPALA_TWEETS (
id int,
id_str string,
text string,
created_at timestamp,
geo_latitude double,
geo_longitude double,
user_screen_name string,
user_location string,
user_followers_count string,
user_profile_image_url string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" =
":key,tweet:id_str,tweet:text,tweet:created_at,tweet:geo_latitude,tweet:geo_longitude, user:screen_name,user:location,user:followers_count,user:profile_image_url"
)
TBLPROPERTIES("hbase.table.name" = "tweets");
But I got an error on: the strored by:
Query: create EXTERNAL TABLE HB_IMPALA_TWEETS ( id int, id_str string, text string, created_at timestamp, geo_latitude double, geo_longitude double, user_screen_name string, user_location string, user_followers_count string, user_profile_image_url string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,tweet:id_str,tweet:text,tweet:created_at,tweet:geo_latitude,tweet:geo_longitude, user:screen_name,user:location,user:followers_count,user:profile_image_url" ) TBLPROPERTIES("hbase.table.name" = "tweets")
ERROR: AnalysisException: Syntax error in line 1:
...image_url string ) STORED BY 'org.apache.hadoop.hive.h...
Encountered: BY
Expected: AS
CAUSED BY: Exception: Syntax error
For info, I followed this page:
https://github.com/AronMacDonald/Twitter_Hbase_Impala/blob/master/README.md
Thanks for helping me :)

Well, it seems that Impala still not support the SerDe (serialization/deserialisation).
"You create the tables on the Impala side using the Hive shell,
because the Impala CREATE TABLE statement currently does not support
custom SerDes and some other syntax needed for these tables: You
designate it as an HBase table using the STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' clause on the Hive
CREATE TABLE statement."
So, just run the command on the hive shell, or hue hive, then, in impala, type 'invalidate metadata', and then you can see your table with a 'show tables'.
So for this part the problem seems solved.

Related

udf map function returns a table with unnamed schema

code:
func = udf(log_parser, result_type=DataTypes.ROW(
[DataTypes.FIELD("ts", DataTypes.TIMESTAMP(precision=3)),
DataTypes.FIELD("clientip", DataTypes.STRING()),
DataTypes.FIELD("recordtime", DataTypes.STRING())]))
table = table.map(func)
table.print_schema()
output:
(
`_c0` TIMESTAMP(3),
`_c1` STRING,
`_c2` STRING
)
This looks strange to me, shouldn't it print a schema with defined column names?

This is a known issue and should have be addressed in FLINK-27282. It's fixed recently and so still not released.

Apollo ios codegen generates optional values

I'm trying with all the latest version of apollo-ios but i'd like to solve this one lingering problem: I keep getting optional values (see image below).
Here's what I've explored (but still can't find whyyy)
When I created the table, Nullable is false. Then, I create a view which is for public to access it.
With apollo schema:download command, here's the generated json: schema.json
With graphqurl command, here's the generated schema.graphql: schema.graphql. Here's the snippet.
"""
columns and relationships of "schedule"
"""
type schedule {
activity: String
end_at: timestamptz
id: Int
"""An array relationship"""
speakers(
"""distinct select on columns"""
distinct_on: [talk_speakers_view_select_column!]
"""limit the number of rows returned"""
limit: Int
"""skip the first n rows. Use only with order_by"""
offset: Int
"""sort the rows by one or more columns"""
order_by: [talk_speakers_view_order_by!]
"""filter the rows returned"""
where: talk_speakers_view_bool_exp
): [talk_speakers_view!]!
start_at: timestamptz
talk_description: String
talk_type: String
title: String
}
I am suspecting that it looks like id: Int is missing ! in the schema, is the cause of codegen interpreting it as optional. But I could be wrong too. Here's the repo for the complete reference https://github.com/vinamelody/MyApolloTest/tree/test

It's because Postgres marks view columns as explicitly nullable, regardless of the underlying column nullability, for some unknown reason.
Vamshi (core Hasura server dev) explains it here in this issue:
https://github.com/hasura/graphql-engine/issues/1965
You don't need that view though -- it's the same as doing a query:
query {
talks(
where: { activity: { _like: "iosconfig21%" } },
order_by: { start_at: "asc" }
}) {
id
title
start
<rest of fields>
}
Except now you have a view you need to manage in your Hasura metadata and create permissions for, like a regular table, on top of the table it's selecting from. My $0.02 anyways.
You can even use a GraphQL alias if you really insist on it being called "schedule" in the JSON response
https://graphql.org/learn/queries/

How to write a SELECT for a field of type "quantity" in SET_GET_ENTITY method?

I want to select a single row with all columns from my table zbookings. zbookings table has a structure based on zbooking data structure - see tables below.
Table ZBOOKINGS:
Structure ZBOOKING:
My BOOKINGSET_GET_ENTITY method:
method BOOKINGSET_GET_ENTITY.
DATA: ls_keytab TYPE LINE OF /IWBEP/T_MGW_NAME_VALUE_PAIR,
i_carrid TYPE string,
i_connid TYPE string,
i_fldate TYPE string,
i_bookid TYPE string.
LOOP AT it_key_tab INTO ls_keytab.
CASE ls_keytab-name.
WHEN 'Carrid'.
i_carrid = ls_keytab-value.
WHEN 'Connid'.
i_connid = ls_keytab-value.
WHEN 'Fldate'.
i_fldate = ls_keytab-value.
WHEN 'Bookid'.
i_bookid = ls_keytab-value.
ENDCASE.
ENDLOOP.
SELECT SINGLE *
INTO CORRESPONDING FIELDS OF er_entity
FROM ybookings AS a
WHERE
a~carrid = i_carrid AND
a~connid = i_connid AND
a~fldate = i_fldate AND
a~bookid = i_bookid.
endmethod.
I tested it via SAP Gateway Client. It's OK when I remove column luggweight from my SELECT SINGLE * statement. However when I select all columns via SELECT SINGLE *, then it outputs an error
Runtime Error: 'SAPSQL_PARSER_TODO_WARNING'`
<?xml version="1.0" encoding="UTF-8"?>
<error>
<code>SAPSQL_PARSER_TODO_WARNING</code>
<message>Runtime Error: 'SAPSQL_PARSER_TODO_WARNING'.
The OData request processing has been abnormal terminated. If "Runtime Error"
is not initial, launch transaction ST22 for details and analysis. Otherwise,
launch transaction SM21 for system log analysis.</message>
<timestamp>20190905144432</timestamp>
</error>
As you can see the problem is with luggweight field which is of quantity type and its typing method is Type ref to. When I check my BOOKINGSET_GET_ENTITY method via ctr+F2 it outputs a warning:
The database field or the result type of the aggregate function LUGGWEIGHT and the component "LUGGWEIGHT" of "ER_ENTITY" are not compatible.
How should I modify my SELECT query / BOOKINGSET_GET_ENTITY method for it to work?

Luggweight field's typing method should be set to Types (not Type ref to) when creating / modifying zbooking data structure.

Doctrine Assocation Mapping Join Not Executing In ZF3

I am creating my first Association Mapping for a Join. This is also the first time I've used a Foreign Key in pgSQL.
I am working with ZF3. The error I am receiving is:
An exception occurred while executing 'SELECT p0_.reference AS reference_0, p0_.meta_keyword_reference AS meta_keyword_reference_1, p0_.add_date AS add_date_2, p0_.add_membership_reference AS add_membership_reference_3, p0_.remove_date AS remove_date_4, p0_.remove_membership_reference AS remove_membership_reference_5 FROM page_about_meta_keyword_link p0_ INNER JOIN meta_keyword m1_':
SQLSTATE[42601]: Syntax error: 7 ERROR: syntax error at end of input LINE 1: ...page_about_meta_keyword_link p0_ INNER JOIN meta_keyword m1_
The query I am trying to create is
SELECT MetaKeywords.Keyword FROM PageAboutMetaKeywordLink INNER JOIN MetaKeywords ON PageAboutMetaKeywordLink.MetaKeywordReference = MetaKeywords.Reference WHERE PageAboutMetaKeywordLink.RemoveDate IS NULL ORDER BY MetaKeywords.Keyword ASC
From my database experience I expect it is creating the error due to the missing
ON p0_.meta_keyword_reference = m1_reference
I don't understand how to communicate the Join. Based on the documentation I had expected this was automatic. Maybe I misunderstood.
The tables I am trying to Join are page_about_meta_keyword_link.meta_keyword_reference ON meta_keyword.reference . This is the first time I've created a foreign key in pgSQL.
This is the table structure for page_about_meta_keyword_link
CREATE TABLE public.page_about_meta_keyword_link
(
reference bigint NOT NULL DEFAULT nextval('page_about_meta_keyword_link_reference_seq'::regclass),
meta_keyword_reference bigint,
add_date timestamp with time zone DEFAULT now(), -- UTC
add_membership_reference bigint,
remove_date timestamp with time zone, -- UTC
remove_membership_reference bigint,
CONSTRAINT page_about_meta_keyword_link_pkey PRIMARY KEY (reference),
CONSTRAINT page_about_meta_keyword_link_fk FOREIGN KEY (meta_keyword_reference)
REFERENCES public.meta_keyword (reference) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT page_about_meta_keyword_link_reference_unique UNIQUE (reference)
)
This is the meta_keyword
CREATE TABLE public.meta_keyword
(
reference bigint NOT NULL DEFAULT nextval('meta_keyword_reference_seq'::regclass),
keyword text,
effective_date timestamp with time zone DEFAULT now(), -- UTC
membership_reference bigint,
CONSTRAINT meta_keyword_pkey PRIMARY KEY (reference),
CONSTRAINT meta_keyword_reference_unique UNIQUE (reference)
)
This is the query I've created in the Service; The complete Service is found here.
$repository = $this->entityManager->getRepository(PageAboutMetaKeywordLink::class);
$keywords = $this->entityManager->getRepository(MetaKeyword::class);
$qb = $repository->createQueryBuilder('l');
$qb ->join('\Application\Entity\MetaKeyword' , 'k')
->expr()->isNull('l.removeDate');
return $qb->getQuery()->getResult();
The Association Mapping I created is for meta_keyword_reference; The complete Entity is found here.
/**
* #var int|null
*
* #ORM\ManyToOne(targetEntity="MetaKeyword")
* #ORM\JoinColumn(name="meta_keyword_reference", referencedColumnName="reference")
* #ORM\Column(name="meta_keyword_reference", type="bigint", nullable=true)
*/
private $metaKeywordReference;
I have not made any changes to the MetaKeywords Entity. It is found here.
Overall the various sections of the web site will share the meta_keywords. If I understand correctly the connection I am trying to make is ManyToOne.
I am wanting to leave a good reference for other newbies as they are their journey with Zend Framework 3 - Doctrine. Please advise of edits I should be making to this post so it is clear, understandable and concise so I receive the help I need and others will benefit from this in the future.

You double declared a column (meta_keyword_reference). Looking at the docs (same page you linked in question), you've made a mistake in your Annotation. Remove the ORM\Column line (the definition is already in JoinColumn). If you need it to be nullable (not required), add nullable=true to the JoinColumn; use either, not both
/**
* #var int|null
*
* #ORM\ManyToOne(targetEntity="MetaKeyword")
* #ORM\JoinColumn(name="meta_keyword_id", referencedColumnName="id", nullable=true)
*/
private $metaKeywordReference;
Do not worry about declaring a "type", Doctrine will automatically match it to the column you're referencing. Also, you should be referencing Primary Keys. I've assumed reference is not the PK, so I've changed it to id, change it to what it actually is.
Next, I think you're also using DBAL QueryBuilder instead of the ORM QueryBuilder.
The Query you need would be like this:
use Doctrine\ORM\Query\Expr\Join;
use Doctrine\ORM\QueryBuilder;
/** #var QueryBuilder $qb */
$qb = $this->entityManager->createQueryBuilder();
$qb->select('l')
->from(PageAboutMetaKeywordLink::class, 'l')
->join(MetaKeyword::class, 'k', Join::ON, 'l.reference = k.id') // check these property names (NOT DB COLUMNS!)
->where('l.removeDate is null');
Might be a few small errors in there, but that should be about it.

Numeric sort in Manual (Legacy) index in Neo4j 3 is not working correctly

I'm using Legacy indexing (now called Manual indexing). After migration from Neo4j 2 to version 3 I have some problems with numeric sorting.
Example of correct statement in Neo4j 2:
queryContext.sort(new Sort(new SortField(AGE, SortField.INT, false)));
This stament should be changed for Neo4j 3 (Lucene 5):
queryContext.sort(new Sort(new SortField(AGE, SortField.Type.INT, false)));
But if you use this sort statement you will get an exception:
java.lang.IllegalStateException: unexpected docvalues type SORTED_SET for field 'firstName' (expected=SORTED). Use UninvertingReader or index with docvalues.
at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
at org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:762)
at org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:767)
at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)
at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:164)
at org.neo4j.kernel.api.impl.index.collector.DocValuesCollector.replayTo(DocValuesCollector.java:297)
at org.neo4j.kernel.api.impl.index.collector.DocValuesCollector.getTopDocs(DocValuesCollector.java:275)
at org.neo4j.kernel.api.impl.index.collector.DocValuesCollector.getIndexHits(DocValuesCollector.java:150)
at org.neo4j.index.impl.lucene.legacy.LuceneLegacyIndex.search(LuceneLegacyIndex.java:346)
at org.neo4j.index.impl.lucene.legacy.LuceneLegacyIndex.query(LuceneLegacyIndex.java:261)
at org.neo4j.index.impl.lucene.legacy.LuceneLegacyIndex.query(LuceneLegacyIndex.java:205)
at org.neo4j.index.impl.lucene.legacy.LuceneLegacyIndex.query(LuceneLegacyIndex.java:217)
at org.neo4j.kernel.impl.api.StateHandlingStatementOperations.nodeLegacyIndexQuery(StateHandlingStatementOperations.java:1440)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeLegacyIndexQuery(OperationsFacade.java:1162)
at org.neo4j.kernel.impl.coreapi.LegacyIndexProxy$Type$1.query(LegacyIndexProxy.java:83)
at org.neo4j.kernel.impl.coreapi.LegacyIndexProxy.query(LegacyIndexProxy.java:365)
I think this is caused by new added statement in Neo4j indexer class (Neo4j is indexing field for sorting automatically now?). See in:
org.neo4j.index.impl.lucene.legacy.IndexType CustomType addToDocument( Document document, String key, Object value )
new line:
document.add( instantiateSortField( key, value ) );
and method instantiateSortField is creating SortedSetDocValuesField
So I changed my code to:
queryContext.sort(new Sort(new SortedSetSortField(AGE, false)));
This runs OK but sorting is not working because numbers are sorted as string. I see that "value" parameter is String every time in method "addToDocument". I think the root cause is explained it this old comment:
see comment in class org.neo4j.index.impl.lucene.legacy.IndexType CustomType
// TODO We should honor ValueContext instead of doing value.toString() here.
// if changing it, also change #get to honor ValueContext.
Am I missing some new way how to index, search and sort data in Neo4j 3 or this is really problem that values are indexed as string in Neo4j?
Simple unit test for Neo4j 2 and Neo4j 3 can be downloaded

Solution added by MishaDemianenko at GH issue

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Impala create external table, stored by Hive - twitter

Related

udf map function returns a table with unnamed schema

Apollo ios codegen generates optional values

How to write a SELECT for a field of type "quantity" in SET_GET_ENTITY method?

Doctrine Assocation Mapping Join Not Executing In ZF3

Numeric sort in Manual (Legacy) index in Neo4j 3 is not working correctly

Categories

Resources