Is there a way to configure the filename for a Neo4j Desktop database dump file to exclude timestamp? - neo4j

I'm a first time user of Neo4j and following a training course to install and learn the basics.
I've installed Neo4j Desktop on a Windows machine and can see that it comes with a demo DB called "Movie DBMS". I'm trying to follow steps to dump the database, by stopping the database, clicking on "..." and then "Dump".
The dump errors with the following error in the log file:
[2022-01-31 12:54:36.022] [error] Selecting JVM - Version:11.0.8+10-LTS, Name:OpenJDK 64-Bit Server VM, Vendor:Azul Systems, Inc.
java.nio.file.InvalidPathException: Illegal char <:> at index 128: C:\Users<me>.Neo4jDesktop\relate-data\projects<my project name>\movie-dbms-neo4j-31-Jan-2022-12:54:31.dump
It would appear that the automatic configuration for the dump file is adding a timestamp with includes colons (hh:mm:ss). How can I configure the file name to either exclude the timestamp or avoid using ":"?
Thanks.

I had no responses. But I've figured it out myself.
The answer was to use the command line to dump the database manually. At that point I can specify my own "--to=" filename which doesn't include a ":".
Details in this section of the manual: https://neo4j.com/docs/operations-manual/current/backup-restore/offline-backup/#offline-backup

Related

Is there any way to load dump files in Neo4j Desktop 5.1.0 in Windows 11?

I have tried all possible ways to load the pole.dump file into Neo4J :
I have been doing the following for past 3 days now:
Opened the Neo4J Desktop and Using the Add Drop-Down Menu , I have added the pole.dump into the Neo4J Desktop.
Then I have selected the Import dump into existing DBMS -> which is my Graph3 Database.
Then I am going to Neo4J Desktop and from the Database Information, I selected the pole database but I am getting this error
Database "pole" is unavailable, its status is "offline".
I also tried this: https://community.neo4j.com/t5/graphacademy-discussions/cannot-create-new-database-from-dump-file/td-p/39914
i. Database-->Open Folder-->DBMS. Here you will see data/dumps folder
ii. Copy pole.dump file to data/dumps folder (Although there is no folder called dumps in the data folder)
iii. Close the browser. Click on ... and select Terminal.
iv. Terminal window shows up. Enter this command:
bin/neo4j-admin load --from=data/dumps/pole.dump --database=pole --force
v. If successful, close the Terminal window and open the db in browser.
vi. Click on the database icon on top left to see the databases from the dropdown box.
Here you will not see pole db.
vii. Select 'system' database. On the right pane run this Cypher:
CREATE DATABASE pole and execute the Cypher.
viii. Run SHOW DATABASES and you should see pole and check the status. Status should be 'online'.
ix. Select pole from the dropdown list. Once selected you should see all the nodes,
relationships on the left. Now you can start playing with it!!
But I could not pass after point iv as it says in the neo4j terminal if I open it from the Neo4J Desktop , that it could not load - in fact it says there is a parsing error.
I did check with the following:
C:\Users\Chirantan\.Neo4jDesktop\relate-data\dbmss\dbms-11aabb23-daca-4d35-9043-6c039d133a34\bin>neo4j-import Graph3 load --from=data/dumps/pole.dump
'neo4j-import' is not recognized as an internal or external command,
operable program or batch file.
I am coming to this platform because I have tried everything available:
https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/
'neo4j-admin' is not recognized as an internal or external command, operable program or batch file
https://www.youtube.com/watch?v=HPwPh5FUvAk
But could not get any luck.
After step 3, which is:
Then I am going to Neo4J Desktop and from the Database Information, I selected the pole database but I am getting this error
Did you try to start the database with the following command?
START DATABASE pole;
I have already solved the problem. The issue was existent even after I followed the steps provided in the OP. What I did was: I created random texts for all records for names of criminals/victims, friends of victims, friends of criminals, generated random phone numbers, generated random nhs numbers, also generated random addresses using :
https://fossbytes.com/tools/random-name-generator
https://www.randomlists.com/london-addresses?qty=699
Using this code I generated random nhs ids :
import string
import random
# initializing size of string
N = 7
list_str= []
for i in range(699):
# using random.choices()
# generating random strings
res = ''.join(random.choices(string.ascii_uppercase +
string.digits, k=N))
list_str.append(res)
Random Phone Number generated using:
https://fakenumber.in/united-kingdom
There is a better answer.
Go to this url : https://neo4j.com/sandbox/
Then Select the Pre-Installed Databases that come with sandbox- Crime Investigation being one of them with the POLE Database pre-installed.
You will be prompted to open the HOME from there with the POLE Database pre-installed.
You Finally open the Neo4J Browser from here using the drop down menu by pressing the Open Button and access the Neo4J Browser and voila! You can access POLE database using Neo4J

INSERT INTO ... in MariaDB in Ubuntu under Windows WSL2 results in corrupted data in some columns

I am migrating a MariaDB database into a Linux docker container.
I am using mariadb:latest in Ubuntu 20 LTS via Windows 10 WSL2 via VSCode Remote WSL.
I have copied the sql dump into the container and imported it into the InnoDB database which has DEFAULT CHARACTER SET utf8. It does not report any errors:
> source /test.sql
That file does this (actual data truncated for this post):
USE `mydb`;
DROP TABLE IF EXISTS `opsitemtest`;
CREATE TABLE `opsitemtest` (
`opId` int(11) NOT NULL AUTO_INCREMENT,
`opKey` varchar(50) DEFAULT NULL,
`opName` varchar(200) DEFAULT NULL,
`opDetails` longtext,
PRIMARY KEY (`opId`),
KEY `token` (`opKey`)
) ENGINE=InnoDB AUTO_INCREMENT=4784 DEFAULT CHARSET=latin1;
insert into `opsitemtest`(`opId`,`opKey`,`opName`,`opDetails`) values
(4773,'8vlte0755dj','VTools addin for MSAccess','<p>There is a super helpful ...'),
(4774,'8vttlcr2fTA','BAS OLD QB','<ol>\n<li><a href=\"https://www.anz.com/inetbank/bankmain.asp\" ...'),
(4783,'9c7id5rmxGK','STP - Single Touch Payrol','<h1>Gather data</h1>\n<ol style=\"list-style-type: decimal;\"> ...');
If I source a subset of 12 records of the table in question all the columns are correctly populated.
If I source the full set of data for the same table ( 4700 rows ) where everything else is the same, many of the opDetails long text fields have a length showing in sqlYog but no data is visible. If I run a SELECT on that column there are no errors but some of the opDetails fields are "empty" (meaning: you can't see any data), and when I serialize that field, the opDetails column of some records (not all) has
"opDetails" : "\u0000\u0000\u0000\u0000\u0000\u0000\",
( and many more \u0000 ).
The opDetails field contains HTML fragments. I am guessing it is something to do with that content and possibly the CHARSET, although that doesn't explain why the error shows up only when there are a large number of rows imported. The same row imported via a set of 12 rows works correctly.
The same test of the full set of data on a Windows box with MariaDB running on that host (ie no Ubuntu or WSL etc) all works perfectly.
I tried setting the table charset to utf8 to match the database default but that had no effect. I assume it is some kind of Windows WSL issue but I am running the source command on the container all within the Ubuntu host.
The MariaDB data folder is mapped using a volume, again all inside the Ubuntu container:
volumes:
- ../flowt-docker-volumes/mariadb-data:/var/lib/mysql
Can anyone offer any suggestions while I go through and try manually removing content until it works? I am really in the dark here.
EDIT: I just ran the same import process on a Mac to a MariaDB container on the OSX host to check whether it was actually related to Windows WSL etc and the OSX database has the same issue. So maybe it is a MariaDB docker issue?
EDIT 2: It looks like it has nothing to do with the actual content of opDetails. For a given row that is showing the symptoms, whether or not the data gets imported correctly seems to depend on how many rows I am importing! For a small number of rows, all is well. For a large number there is missing data, but always the same rows and opDetails field. I will try importing in small chunks but overall the table isn't THAT big!
EDIT 3: I tried a docker-compose without a volume and imported the data directly into the MariaDB container. Same problem. I was wondering whether it was a file system incompatibility or some kind of speed issue. Yes, grasping at straws!
Thanks,
Murray
OK. I got it working. :-)
One piece of info I neglected to mention, and it might not be relevant anyway, is that I was importing from an sql dump from 10.1.48-MariaDB-0ubuntu0.18.04.1 because I was migrating a legacy app.
So, with my docker-compose:
Version
Result
mysql:latest
data imported correctly
mariadb:latest
failed as per this issue
mariadb:mariadb:10.7.4
failed as per this issue
mariadb:mariadb:10.7
failed as per this issue
mariadb:10.6
data imported correctly
mariadb:10.5
data imported correctly
mariadb:10.2
data imported correctly
Important: remember to completely remove the external volume mount folder content between tests!
So, now I am not sure whether the issue was some kind of sql incompatibility that I need to be aware of, or whether it is a bug that was introduced between v10.6 and 10.7. Therefore I have not logged a bug report. If others with more expertise think this is a bug, I am happy to make a report.
For now I am happy to use 10.6 so I can progress the migration- the deadline is looming!
So, this is sort of "solved".
Thanks for all your help. If I discover anything further I will post back here.
Murray

Error adding existing repository onto PlasticSCM server

I'm trying to add some existing repositories into my PlasticSCM server so I can migrate the back end.
I'm using the following to attempt to do it:
cm addrep rep_11.plastic r11 localhost:8087
However this gives me the following error message:
The id specified in the repository database rep_11.plastic is not correct.
It should be a number.
I can't seem to find this error message listed anywhere online, and I can't find any obvious non-numeric ID field in the database itself.
I've tried it with a number of them and they all give the same message. I'm not actually 100% sure which is which, so I'm using a generic repository name (r11) until I can have a look around them, but I'd assume that would be OK.
This is with default settings, so the SQLLite backend. I really need to get all these imported into it so they can be migrated to MySQL.
According to the command help:
Usage:
cm addrepository | addrep db_file rep_name repsvr_spec
db_file The name of the database file on the database backend.
rep_name The name of the repository.
repsvr_spec Repository server specification:
[repserver:]svr_name:svr_port
In your example, please run:
cm addrep rep_11 r11 localhost:8087

Hadoop Informatica Log processing

I am working on a project involving creating a queryable set of data from fairly large Informatica log files. To do so, the files are imported into a Hadoop cluster using Flume, which was already configured by a coworker before I began this project. My job is to create a table from the data contained within the logs so that queries can be performed easily. The issue I'm encountering has to do with log file formatting. The logs are in the format:
Timestamp : Severity : (Pid | Thread) : (ServiceType | ServiceName) : ClientNode : MessageCode : Message
The issue is that sometimes the message field contains additional colon-delimited comments, for example a message could be [ x : y : z ]. When using HCatalog to create the table I cannot account for this behavior and instead results in additional columns.
Any suggestions? Normally I would use Ruby to separate the fields or replace the delimiter to keep the integrity when importing using HCatalog. Is there some pre-processing I can do cluster side allowing me to do this? The files are too large to handle locally.
The answer was to use a pig script and Python UDF. The pig script loads in the file then calls the Python script line by line to break the fields properly. The result can then be written to a friendlier CSV and/or stored in a table.

Deedle - what's the schema format for readCsv

I was using Deedle in F# to read a txt file (no header) to data frame, and cannot find any example about how to specify the schema.
let df= Frame.ReadCsv(datafile, separators="\t", hasHeaders=false, schema=schema)
I tried to give a string with names separated by ',', but seems don't work.
let schema = #"name, age, address";
I did some search on the doc, but only find following - don't know where I can find the info. :(
schema - A string that specifies CSV schema. See the documentation
for information about the schema format.
The schema format is the same as in the CSV type provider in F# Data.
The only problem (quite important!) is that the Deedle library had a bug where it completely ignores the schema parameter, so no matter what you provide, it would be ignored.
I just submitted a pull request that fixes the bug and also includes some examples (in the form of unit tests). See the pull request here (and click on "Files changed" to see the samples).
If you do not want to wait for a new release, just get the code from my GitHub fork and build it using build.cmd in the root (run this for the first time to restore packages). The complete build requires local installation of R (because it builds R plugin too), but it should build Deedle.dll and then fail... (After the first run of build.cmd, you can just use Deedle.sln solution).

Resources