print result was truncated in databricks notebook - printing

To get schema in JSON format, I used "print(df.schema.json())" command. The schema was too large and was truncated.
Is there any way to show it all?

No, there is a limit on maximum size of the output in the notebook cell. If you need to store your schema, output it into the file on the DBFS, and then copy file to local machine, for example, by using databricks-cli:
with open("/dbfs/FileStore/my-schema.json", "w") as f:
f.write(df.schema.json())

Related

Input data for Amazon Sagemaker Python SDK

Hallo can someone tell me in what format my input data has to be. Now I have it in csv format with the first column being the target variable but I always get a Algorithm Error which I think is due to wrong input data format.
trainpath = sess.upload_data(
path='revenue_train.csv', bucket=bucket,
key_prefix='production')
testpath = sess.upload_data(
path='revenue_test.csv', bucket=bucket,
key_prefix='production')
# launch training job, with asynchronous call
sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False)
when you use a custom Docker or framework estimator (like you do) you can use any file format (csv, pdf, mp4, whatever you have in S3). The Sklearn container and estimator are agnostic of the file format ; it is the role of your user-provided Python code in the estimator to know how to read those files.

Store Terminal Output in a File ros melodic cpp

I have a pcd file with x y z coordinates from point cloud.
Now I have another cpp file from where I print x y z coordinates on terminal. ( This is just the coordinates not a point cloud)
I want to store this in another file in order to compare it with pcd file.
How do I do it?
Why do you need to store it directly from stdout? There are a couple of different ways to go about this that are probably easier.
You can simply publish the (x,y,z) data and record it with rosbag record and then export via rosbag_to_csv.
You could also just write the values to a file directly in the code instead of printing it out. Since you did not specify Python or C++ here is a quick example in Python
f = open('your_output_file.csv', 'a')
while not rospy.is_shutdown():
#Whatever ops to get data
x,y,z = get_the_data()
output_str = str(x)+','+str(y)+','+str(z)
f.write(output_str)
ROS will also automatically log output from rospy.log*() functions. You can control where this is stored by exporting the environmental variable ROS_LOG_DIR. Note that this may not work 100% correctly for print() statements
Finally, if you really really need to use stdout for some reason you can always redirect the output from however you're launching the node. Ex: roslaunch your_package your_launch.launch >> some_file.txt

How to load a large csv file into Neo4j

I'm trying to load a large csv file (1458644 row) into neo4j, but i'm still getting this error :
Neo.TransientError.General.OutOfMemoryError: There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
even if i change dbms.memory.heap.max_size=1024m with m=megbite , the same error occurs again !
Note : the size of the csv is 195.888 KB
this is my code :
load csv with headers from "file:///train.csv" as line
create(pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
create (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);
what should i do ?
You should use periodic commits to process the CSV data in batches. For example, this will process 10,000 lines at a time (the default batch size is 1000):
USING PERIOD COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///train.csv" as line
CREATE (pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
CREATE (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);
I Solved the problem by copying the solution for limited row here
so this is my solution:
USING PERIODIC COMMIT
load csv with headers from "file:///train.csv" as line
with line LIMIT 1458644
create (pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
create (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);
the downside of this solution is that you need to know the number of rows of your big csv file (excel can't open large csv files).

Best way to transpose a grid of data in a file

I have large data files of values on a 2D grid.
They are organized such that subsequent rows of data in the grid are subsequent lines in the file.
Each column is separated by a tab character.
Essentially, this is a CSV file, but with tabs instead of columns.
I need the transpose the data (first row becomes first column) and output it to another file. What's the best way to do this? Any language is okay (I prefer to use Perl or C/C++). Currently, I have Perl script just read in the entire file into memory, but I have files which are simply gigantic.
The simplest way would be to make multiple passes through your input, extracting a subset of columns on each pass. The number of columns would be determined by how much memory you wanted to use and how many rows are in the input file.
For example:
On pass 1 you read the entire input file and process only the first, say, 10 columns. If the input had 1 million rows, the output would be a file with 1 million columns and 10 rows. On the next pass you would read the input again, and process columns 11 thru 20, appending the results to the original output file. And so on....
If you have Python with NumPy installed, it's as easy as this:
#!/usr/bin/env python
import numpy, csv
with open('/path/to/data.csv', 'rb') as file:
csvdata = csv.reader()
data = numpy.array(csvdata)
transpose = data.T
... the csv module is part of Python's standard library.

What is Excel 2007 workbook Name size limit? Why?

Workbook names in Excel 2007 are supposed to be limited in size only by memory, but this appears not to be the case. Eventually, an array saved to a name will get big enough that when you try to save you get (paraphrased): "one or more of the formulas in this workbook is larger than the 8192 character limit, please save as binary file".
OK, so then save it to a binary file format... but even here, an array can get big enough to make saving the file impossible.
What gives? How are names being stored within Excel that this occurs? Is this something particular to the installation? Is there a way around it?
Try it out yourself with the code below. It will run perfectly and the name will be properly populated, but saving will give you somenasty errors. 3351 elements is too many but 3350 saves just fine:
Public Sub TestNameLimits()
Dim v As Variant
ReDim v(1)
Dim index As Integer
For index = 1 To 3351
ReDim Preserve v(index)
v(index) = "AAAA"
Next
Call Application.Names.Add("NameLimit", v)
End Sub
The Names collection is a feature of Excel that has been around for a very long time. The formula length limitation in Excel 2003 is 1,024 (2^10) but was expanded for Excel 2007 to 8,192 (2^13).
These two articles describe the main size limitations for Excel 2003 and Excel 2007:
Excel 2003 specifications and limits
Excel 2007 specifications and limits
To solve this, I would have a look at the Excel.Worksheet.CustomProperties collection. I believe that the Worksheet.CustomProperties item size is limited only by memory. You will have to test this on your system, and probably in various versions of Excel as well, but I think you should be easily able to save well over a 10,000,000 characters.
However, when using the Worksheet.CustomProperties collection, you will be responsible for converting your array to and from a string yourself, unlike the Names collection, which can convert your array to a string automatically.

Resources