fluent-plugin-bigquery: placeholder substring

fluent-plugin-bigquery: placeholder substring - fluentd

I have a JSON input to fluentd like this.
{ "dateTime": "YYYY-mm-dd HH:MM:SS" }
I would like to store this message to Google BigQuery table table_YYYYmmdd.
I know I can write config like
<match tag>
#type bigquery
table table_%Y%m%d
:
</match>
But the date in the JSON message is not today. I would like to store the record in accordance with the date in the JSON message.
How can I do this?

self solved.
<filter tag>
#type record_transformer
enable_ruby true
<record>
yyyymmdd ${record["dateTime"][0..9].gsub("-", "")}
</record>
</filter>
<match tag>
#type bigquery
ignore_unknown_values true
table table_${yyyymmdd}
:
</match>

Related

Filtering out login/logout log entries based on specific word (grep filter plugin)

I am trying to filer out my log entries that contain a specific word. We get tons of login and logout events in our logs and i dont want to ship those entries, i want to filter them out. I looked into the grep filter plugin and based on the way i am understanding it it seems straight forward enough (grep message for specific word and exclude) but my setup isnt working as i am still seeing the logs entries in Splunk.
example log entry:
{"message":"Aug 21 09:46:15 linuxhost OSd[15]: logout(11100) usec=69895 tz=-07:00 seq=4569812 category=audit user=admin client-pid=154872 : logout"}
example section from my td-agent.conf:
<filter login.logout>
#type grep
<exclude>
key message
pattern login
pattern logout
</exclude>
</filter>

Try followings:
<filter login.logout>
#type grep
<exclude>
key message
pattern /login|logout/
</exclude>
</filter>
Or
<filter login.logout>
#type grep
<or>
<exclude>
key message
pattern /login/
</exclude>
<exclude>
key message
pattern /logout/
</exclude>
</or>
</filter>
See https://docs.fluentd.org/v1.0/articles/filter_grep for more details.

I was able to get it to work by separating out into two filters.
<filter ems>
#type grep
<exclude>
key message
pattern login
</exclude>
</filter>
<filter ems>
#type grep
<exclude>
key message
pattern logout
</exclude>
</filter>

IIS Url redirect match pattern

I am redirecting URL in the config file. This is what I am doing:
<rule name="award temp redirect" stopProcessing="true">
<match url="(.*?)/?award.aspx$" />
<action type="Redirect" url="http://www.abc.com.au/award.aspx" redirectType="Temporary" />
</rule>
This is working fine for /award.aspx but I want to add for /awards or /award as well. Currently I have create separate rules for /awards or /award.
Can I add these two into one match. If the Url is /award.aspx or /award or /awards then redirect to http://www.abc.com.au/award.aspx.

match accept regex for url value. Try:
<match url="(.*?)/?awards?(\.aspx)?$" />
This will match award, award.aspx, awards and awards.aspx.
Update to match /test as well:
<match url="(.*?)/?(awards?|test)(\.aspx)?$" />
It is just a regular expression. You can match whatever you want.

Solr ShingleFilterFactory in query analysis not worikng

I have field with definition below, I works perfect in analysis but when I try to query it in that way, query analysis behaves different. What I am missing?
data: thd_keyphrase: Privately held companies based in California,Social media,Privately held companies
query: q=thd_keyphrase:find any social media
in analysis query is processed this way: |find any|any social|social media
and it matches Social media
output from debug query is sifferent:
"rawquerystring": "thd_keyphrase:find any social media",
"querystring": "thd_keyphrase:find any social media",
"parsedquery": "thd_keyphrase:find text:ani text:social text:media",
"parsedquery_toString": **"thd_keyphrase:find text:ani text:social text:media",**
or when I remove default field text : "msg": "no field name specified in query and no default specified via 'df' param",
<fieldType name="keyphrase" class="solr.TextField" omitNorms="false" termVectors="false" multiValued="false">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"
outputUnigrams="false" outputUnigramsIfNoShingles="true" tokenSeparator=" "/>
<!-- <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="true" enablePositionIncrements="false"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>

Since you have spaces in the text string make sure to surround it with double quotes like so:
q=thd_keyphrase:"find any social media"
Also, do you mean to tokenize the field on comma?

Scriptella - Date format conversion not working

I am having a problem in date format conversion while I am exporting data from MySQL to CSV.
I am using scriptelaa.1.1, the latest one I guess.
Here is my etl.properties file:
driver=mysql
url=jdbc:mysql://localhost:3306/<my_DB_name>
user=<user_name>
password=<password>
classpath=/path/to/mysql-connector-java-5.1.19.jar;
here is my etl.xml file:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<description>Scriptella ETL File Template.</description>
<properties>
<include href="/path/to/etl.properties"/> <!--Load from external properties file-->
</properties>
<!-- Connection declarations -->
<connection id="in" driver="${driver}" url="${url}" user="${user}" password="${password}" classpath="$classpath">
</connection>
<connection id="out" driver="csv" url="report.csv">
#Use empty quote to turn off quoting
quote=
null_string=\\N
format.dob.type=date
format.dob.pattern=yyyy-MM-dd HH:mm:ss
</connection>
<query connection-id="in">
SELECT * FROM test;
<script connection-id="out">
$1,$2,$3,$4
</script>
</query>
</etl>
dob is my column name in MySQL table, it is datetime type column there.
Now when I export the data from MySQL the time comes in the format yyyy-MM-dd HH:mm:ss.S
But I want yyyy-MM-dd HH:mm:ss, so I have used
format.dob.type=date
format.dob.pattern=yyyy-MM-dd HH:mm:ss
As it is suggested scriptella.1.1 has the feature and to use it, in the following link:
http://scriptella.javaforge.com/reference/index.html
But it is not working.
Can anyone help me out.
Thanks. :)

Formatting rules are applied to the variable name, therefore exactly the same variable name should be used for both format description and the expansion placeholder. In your cause, try using $dob instead of the column number.

While searching with sunspot_rails on Solr, how can I boost whole word matching over partial word matching?

I am using sunspot_rails to submit queries to a Solr instance. Everything works ok, but I want to order my results with the following criteria: I want to take first the documents where the matching term appears as word rather than as part of a word.
Hence, if I have the two documents:
1) Solr searching with Solr is fantastic
and
2) Solr is very good to support search with free text
and the term I am looking for is : search, then
I want to take both documents in the results, but I want document (2) to appear first.
I have tried order_by :score, :desc but it does not seem to be working. Unless I find a way to tell how the "score" is calculated.
Thanks in advance
Panayotis

You would need to maintain two fields with Solr.
One with the Original value and other with the analyzed value.e.g. text_org and text (which is analyzed)
Then you can adjust the boost accordingly, boosting the original field value over the analyzed one e.g. text_org^2 text^1
Remember if it matches the original, it will also match the analyzed text or the effect for the exact whole word match is more then the normal match.

Expanding on Jayendra's answer a bit, you should index into two separate fields.
Here's an example schema.xml excerpt for Sunspot, from my answer to an earlier question: How to boost longer ngrams in solr?
<schema>
<types>
<!--
A text type with minimal text processing, for the greatest semantic
value in a term match. Boost this field heavily.
-->
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StandardFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<!--
Looser matches with NGram processing for substrings of terms and synonyms
-->
<fieldType name="text_ngram" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StandardFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="6" side="front" />
</analyzer>
</fieldType>
<!-- other stuff -->
</types>
<fields>
<!-- other fields; refer to *_text -->
<dynamicField name="*_ngram" type="text_ngram" ... />
</fields>
</schema>
In your searchable block, you can use the :as option to specify the fieldname:
searchable do
text :title
text :title, :as => :title_ngram
# ...
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

fluent-plugin-bigquery: placeholder substring - fluentd

self solved. <filter tag> #type record_transformer enable_ruby true <record> yyyymmdd ${record["dateTime"][0..9].gsub("-", "")} </record> </filter> <match tag> #type bigquery ignore_unknown_values true table table_${yyyymmdd} : </match>

Related

Filtering out login/logout log entries based on specific word (grep filter plugin)

IIS Url redirect match pattern

Solr ShingleFilterFactory in query analysis not worikng

Scriptella - Date format conversion not working

While searching with sunspot_rails on Solr, how can I boost whole word matching over partial word matching?

Categories

Resources