I am new to Influxdb and working in a windows environment, trying to import a file for batch insert.. appreciate any help Thanks.
Below is the sample format of the file I am terminating every with line feed (\n)
# DML
# CONTEXT-DATABASE: StatsArchive
# CONTEXT-RETENTION-POLICY: oneyear
DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=log,FileSizeMb=222999 AvgUsedSpaceMB=191883i MinUsedSpaceMB=191089i MaxUsedSpaceMB=192198i 1442188800
DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=Data,FileSizeMb=55996 AvgUsedSpaceMB=160i MinUsedSpaceMB=47i MaxUsedSpaceMB=357i 1442361600
Output:
influx.exe -import -path=C:\stats.csv -precision=s
2016/07/19 22:39:08 error writing batch: {"error":"unable to parse 'DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=log,FileSizeMb=222999 AvgUsedSpaceMB=191883i MinUsedSpaceMB=191089i MaxUsedSpaceMB=192198i 1442188800': bad timestamp\nunable to parse 'DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=Data,FileSizeMb=55996 AvgUsedSpaceMB=160i MinUsedSpaceMB=47i MaxUsedSpaceMB=357i 1442361600': bad timestamp\nunable to parse
The data that you've listed isn't in line-protocol. The general structure of line-protocol is as follows:
<measurement>[,<tag>[,<tag>] ...] <field>[,<field> ...] <timestamp>
I've adjusted the example you've given to be in line protocol below:
# DML
# CONTEXT-DATABASE: StatsArchive
# CONTEXT-RETENTION-POLICY: oneyear
DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=log FileSizeMb=222999,AvgUsedSpaceMB=191883i,MinUsedSpaceMB=191089i,MaxUsedSpaceMB=192198i 1442188800
DbSpaceUsage,Servername=test,DatabaseName=testdb,FileType=Data FileSizeMb=55996,AvgUsedSpaceMB=160i,MinUsedSpaceMB=47i,MaxUsedSpaceMB=357i 1442361600
Related
Using iforest as described here: https://github.com/titicaca/spark-iforest
But model.save() is throwing exception:
Exception:
scala.NotImplementedError: The default jsonEncode only supports string, vector and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Double.
Followed the code snippet mentioned under "Python API" section on mentioned git page.
from pyspark.ml.feature import VectorAssembler
import os
import tempfile
from pyspark_iforest.ml.iforest import *
col_1:integer
col_2:integer
col_3:integer
assembler = VectorAssembler(inputCols=in_cols, outputCol="features")
featurized = assembler.transform(df)
iforest = IForest(contamination=0.5, maxDepth=2)
model=iforest.fit(df)
model.save("model_path")
model.save() should be able to save model files.
Below is the output dataframe I'm getting after executing model.transform(df):
col_1:integer
col_2:integer
col_3:integer
features:udt
anomalyScore:double
prediction:double
I have just fixed this issue. It was caused by an incorrect param type. You can checkout the latest codes in the master branch, and try it again.
I would like some advice to work around an xml parsing error. In my BLAST xml output, I have a description that has an '&' character which is throwing off the SearchIO.parse function.
If I run
qresults=SearchIO.parse(PLAST_output,"blast-xml")
for record in qresults:
#do some stuff
I get the following error:
cElementTree.ParseError: not well-formed (invalid token): line 13701986, column 30
Which directs me to the this line:
<Hit_def>Lysosomal & prostatic acid phosphatases [Xanthophyllomyces dendrorhous</Hit_def>
Is there a way to override this in biopython so I do not have to change my xml file? Right now, I'm just doing a 'Try/Except' loop, but that is not optimal!
Thanks for your help!
Courtney
A PicklingError is raised when I run my data pipeline remotely: the data pipeline has been written using the Beam SDK for Python and I am running it on top of Google Cloud Dataflow. The pipeline works fine when I run it locally.
The following code generates the PicklingError: this ought to reproduce the problem
import apache_beam as beam
from apache_beam.transforms import pvalue
from apache_beam.io.fileio import _CompressionType
from apache_beam.utils.options import PipelineOptions
from apache_beam.utils.options import GoogleCloudOptions
from apache_beam.utils.options import SetupOptions
from apache_beam.utils.options import StandardOptions
if __name__ == "__main__":
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).runner = 'BlockingDataflowPipelineRunner'
pipeline_options.view_as(SetupOptions).save_main_session = True
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = "project-name"
google_cloud_options.job_name = "job-name"
google_cloud_options.staging_location = 'gs://path/to/bucket/staging'
google_cloud_options.temp_location = 'gs://path/to/bucket/temp'
p = beam.Pipeline(options=pipeline_options)
p.run()
Below is a sample from the beginning and the end of the Traceback:
WARNING: Could not acquire lock C:\Users\ghousains\AppData\Roaming\gcloud\credentials.lock in 0 seconds
WARNING: The credentials file (C:\Users\ghousains\AppData\Roaming\gcloud\credentials) is not writable. Opening in read-only mode. Any refreshed credentials will only be valid for this run.
Traceback (most recent call last):
File "formatter_debug.py", line 133, in <module>
p.run()
File "C:\Miniconda3\envs\beam\lib\site-packages\apache_beam\pipeline.py", line 159, in run
return self.runner.run(self)
....
....
....
File "C:\Miniconda3\envs\beam\lib\sitepackages\apache_beam\runners\dataflow_runner.py", line 172, in run
self.dataflow_client.create_job(self.job))
StockPickler.save_global(pickler, obj)
File "C:\Miniconda3\envs\beam\lib\pickle.py", line 754, in save_global (obj, module, name))
pickle.PicklingError: Can't pickle <class 'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>: it's not found as apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum
I've found that your error gets raised when a Pipeline object is included in the context that gets pickled and sent to the cloud:
pickle.PicklingError: Can't pickle <class 'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>: it's not found as apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum
Naturally, you might ask:
What's making the Pipeline object unpickleable when it's sent to the cloud, since normally it's pickleable?
If this were really the problem, then wouldn't I get this error all the time - isn't a Pipeline object normally included in the context sent to the cloud?
If the Pipeline object isn't normally included in the context sent to the cloud, then why is a Pipeline object being included in my case?
(1)
When you call p.run() on a Pipeline with cloud=True, one of the first things that happens is that p.runner.job=apiclient.Job(pipeline.options) is set in apache_beam.runners.dataflow_runner.DataflowPipelineRunner.run.
Without this attribute set, the Pipeline is pickleable. But once this is set, the Pipeline is no longer pickleable, since p.runner.job.proto._Message__tags[17] is a TypeValueValuesEnum, which is defined as a nested class in apache_beam.internal.clients.dataflow.dataflow_v1b3_messages. AFAIK nested classes cannot be pickled (even by dill - see How can I pickle a nested class in python?).
(2)-(3)
Counterintuitively, a Pipeline object is normally not included in the context sent to the cloud. When you call p.run() on a Pipeline with cloud=True, only the following objects are pickled (and note that the pickling happens after p.runner.job gets set):
If save_main_session=True, then all global objects in the module designated __main__ are pickled. (__main__ is the script that you ran from the command line).
Each transform defined in the pipeline is individually pickled
In your case, you encountered #1, which is why your solution worked. I actually encountered #2 where I defined a beam.Map lambda function as a method of a composite PTransform. (When composite transforms are applied, the pipeline gets added as an attribute of the transform...) My solution was to define those lambda functions in the module instead.
A longer-term solution would be for us to fix this in the Apache Beam project. TBD!
This should be fixed in the google-dataflow 0.4.4 sdk release with https://github.com/apache/incubator-beam/pull/1485
I resolved this problem by encapsulating the body of the main within a run() method and invoking run().
I have a table that I would like to join to an ArcGIS shapefile. My problem is that the table has two Identity fields (i.e. "Plan Number" and "Contract Number") and the shapefile has one Identity field (i.e. "Name"). I want to join the shapefile's "Name" to either the "Plan Number" OR the "Contract Number".
As background, the shapefile is created by manually drawing in polygons in ArcGIS. These polygons represent various projects. The identifier "Name" can either be a project's initial Planning Number, or the Contract Number that exists after the project is budgeted. The Planning Number exists when there is no budget, and the Contract Number comes later. Polygons are created and the "Name" field is filled in with whichever identifying stage (either Planning Number or Contract Number) the project has reached. So, the shapefile field "Name" contains either Planning Numbers or Contract Numbers.
Concurrently, we have a complex Database of all projects with two fields representing both the Planning Number and Contract Number:
PLN------------Contract-----Phase------------Length-----NTP---------SC-------------Notes
1415-003-----WD-2506----Pre-Planning----45----------1/1/1900----1/20/1900-----test
To create my code, I created a simple xml table that links to the Database. This xml table has a PLN (Plan Number) field and a Contract (Contract Number) field. In my code, I converted this xml to a dbf. I am now trying to find a way to join a Shapefile "Name" to EITHER the "PLN" or the "Contract".
Please see code below:
#Convert xlsx to table:
import xlrd
in_excel= r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\JoinTest.xlsx'
out_table= r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\JoinTest.gdb'
# Perform the conversion
join_table= arcpy.ExcelToTable_conversion(in_excel, out_table)
print join_table
# Join
# Set the local parameters
inFeatures = r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\CDDprojects.shp'
joinField =
joinTable = join_table
fieldList = ["PLN", "Contract", "Phase", "Length", "NTP", "SC", "Notes]
I am unsure what to enter in joinField and if there is any other code I should include.
REVISION 1:
I used Ethan's code but received an error message at:
with master_table.open():
with minimal_table.open():
minimal_index = dbf.create_index(minimal_table, lambda record: record.name)
The error reads:
Traceback (most recent call last):
File "W:\Engineering\ENGINEER\LAMP (062012)\Database\VisualDatabase\LAMP.py", line 53, in <module>
with master_table.open():
AttributeError: 'Result' object has no attribute 'open'
REVISION 2:
I am a beginner level so perhaps I am missing something fairly simple. When I try to import dbf, I am receiving the an error after my code:
Traceback (most recent call last):
File "W:\Engineering\ENGINEER\LAMP (062012)\Database\VisualDatabase\LAMP.py", line 50, in <module>
import dbf
ImportError: No module named dbf
I downloaded the dbf module, but when running the setup, I receive this error:
Warning (from warnings module):
File "C:\Python27\ArcGIS10.3\lib\distutils\dist.py", line 267
warnings.warn(msg)
UserWarning: Unknown distribution option: 'install_requires'
I'm not sure what I'm doing wrong to install the dbf.
REVISION 3:
I have installed the dbf module and it is successfully imported into arcpy. However, I am still receiving the same error message:
Traceback (most recent call last):
File "W:\Engineering\ENGINEER\LAMP (062012)\Database\VisualDatabase\LAMP.py", line 56, in <module>
with master_table.open():
AttributeError: 'Result' object has no attribute 'open'
My code is:
#Convert xlsx to table:
import xlrd
in_excel= r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\JoinTest.xlsx'
out_table= r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\JoinTest.gdb'
# Perform the conversion
join_table= arcpy.ExcelToTable_conversion(in_excel, out_table)
import enum
import dbf
# table with all projects at all stages
master_table = join_table
# table with single project and most up-to-date stage
minimal_table = r'W:\\Engineering\\ENGINEER\\LAMP (062012)\\Database\\VisualDatabase\\Planning_Out\\CDDprojects.dbf'
with master_table.open(): (LINE 56 which the AttributeError calls)
with minimal_table.open():
minimal_index = dbf.create_index(minimal_table, lambda record: record.name)
# cycle through master, updating minimal if necessary
for master in master_table:
# look for PLN # first
found = minimal_index.search(master.PLN)
if not found:
# if record doesn't exist with PLN #, try CONTRACT #
found = minimal_index.search(master.Contract)
I am using the dbf module here: https://pypi.python.org/pypi/dbf
Thanks.
I haven't worked with arcpy (and I'm not entirely certain I understand what you are trying to do), but using my dbf module this is what you do to update/add from the master table to the shape table's dbf file:
import dbf
# table with all projects at all stages
master_table = dbf.Table(complex_table)
# table with single project and most up-to-date stage
minimal_table = dbf.Table(single_project_table)
with master_table.open():
with minimal_table.open():
minimal_index = dbf.create_index(minimal_table, lambda record: record.name)
# cycle through master, updating minimal if necessary
for master in master_table:
# look for PLN # first
found = minimal_index.search(master.pln)
if not found:
# if record doesn't exist with PLN #, try CONTRACT #
found = minimal_index.search(master.contract)
if not found:
# not there at all, add it
minimal_table.append(master.contract or master.pln, master.phase, master.length, ...)
break
# have a match, update it
found.name = master.contract or master.pln
# plus any other updates you need
# ...
# and then write the record
dbf.write(found)
I am trying to write code to read a Ruby script entered by the user and stored it in a temp file, then pass that temp file to jruby -c temp_file to do syntax validation.
If any errors are found, I have to show the errors, along with the script entered by the user with line numbers.
How would I do this?
If there is a syntax error you see the failing line number:
blah.rb:2: syntax error, unexpected '.'
So just split your script on newline (\n) and thats your error line (1 index based of course).
If I understand correctly what you are trying to achieve, you can use jruby-parser (install gem install jruby-parser) with JRuby to find line numbers where errors occur:
require 'jruby-parser'
# Read script file into a String
script = File.open('hello.rb', 'r').read
begin
JRubyParser.parse(script)
rescue Exception => e
# Display line number
puts e.position.start_line
# Display error message
puts e.message
end