How to change a /P adress in PDF structure element object with pikepdf - pikepdf

Is it possible with pikepdf to change in a structure element object the /P adress e.g. from /P 166 0 R to /P 59 0 R?
I am using pikepdf in a script to improve PDF accessibility. MS Word is splitting a list over two pages in two /L structure elements in the PDF. Which is an accessibility error.
So the /LI elements of the second /L need to be moved to the first /L. But the /P adress needs to be be changed too. Is there an example how to change the adresses.

Related

How can I move file to a folder based on a year value that is part of the filename

We have almost 2.5 million files of archive data that need to be organized according to the year in which they were created. We need to move files from their current folder on our NAS to another folder that is the year in which the file was created. The destination folder is a four character year value (2003, 2004, etc.). The filename is in the format AAAAAAAAA_YYYYMMDD_BBBBBB.dfa where YYYY is the year value in which the file was created. The file extension can be either .dfa or .dfc. Folders for the appropriate year already exist, but files that are incorrectly placed in the wrong year must be moved to the appropriate year folder.
I need a batch file that will move files from their current location to the appropriate year folder on the NAS, but do not know how to parse the year value from the filename to move the file to the proper year.
Could someone help me with a batch file or script that will do this?
The following batch file walks through the root directory given by variable ROOT recursively and moves files into the appropriate year-folder:
#echo off
rem specify the root directory here (the directory containing the year folders):
set ROOT="."
rem define the file search pattern(s) here:
set PATTERNS="*.dfa" "*.dfc"
rem set this to non-empty for flexible file name parsing:
set FLEXMODE=
rem set this to a log file path
rem (log contains date/time, TRUE/FALSE for move success/failure, source, dest.):
set LOGF=".\movement.log"
setlocal EnableDelayedExpansion
rem loop through every file recursively
for /R %ROOT% %%F in (%PATTERNS%) do (
rem extract parent folder name
set PARENT=%%~dpF
set PARENT=!PARENT:~-5,4!
rem parse file name, extract year portion
if defined FLEXMODE (
for /F "tokens=2 delims=_" %%N in ("%%~nF") do (
set YEAR=%%N
set YEAR=!YEAR:~,4!
)
) else (
set YEAR=%%~nF
set YEAR=!YEAR:~10,4!
)
rem check whether parent folder name equals year portion of file name
if not "!PARENT!"=="!YEAR!" (
rem move file if not in appropriate year folder (no overwrite)
if not exist "%%~dpF..\!YEAR!\%%~nxF" (
move /Y "%%~fF" "%%~dpF..\!YEAR!" > nul
echo %DATE%, %TIME% TRUE "%%~fF" "%%~dpF..\!YEAR!\%%~nxF"
) else (
echo %DATE%, %TIME% FALSE "%%~fF" "%%~dpF..\!YEAR!\%%~nxF"
)
) >> %LOGF%
)
endlocal
Pre-Requisites:
set the variables in the beginning block accordingly:
ROOT: full root directory path;
PATTERNS: file pattern(s) to use for searching;
FLEXMODE: set this to a non-empty value if the AAAAAAAAA portion in your file names AAAAAAAAA_YYYYMMDD_BBBBBB.* may vary in length; in such cases, the first underscore _ is used to find the year YYYY; otherwise (empty value), the year is extracted by its position;
LOGF: path and name of a log file that will contain four columns (separated by 4 spaces): date and time, TRUE/FALSE to indicate success/failure, source file path, destination file path; files that are already placed correctly are not logged here;
the year folders are placed as immediate childs of the given root directory;
all files are located within a year-folder (wrong or right year);
you have sufficient access privileges within the entire root directory tree;
for script testing, place rem in front of the move command and review the log file;
Explanation:
core element is the for /R loop that walks through the directory hierarchy;
for each file, variable PARENT is set to the immediate parent directory name (the ~dp modifier in %%~dpF extracts drive and path, including a trailing backslash);
depending on FLEXMODE, the year YYYY portion is extracted from the file name (if FLEXMODE is defined, a for /F loop is used to parse the file name and split it into underscore-delimited tokens, the first 4 characters of the second token are the year, stored in YEAR; if FLEXMODE is empty, 4 characters at offset 10 are extracted and stored in YEAR);
next the extracted YEAR is checked against the parent directory name PARENT; if equal, do nothing else and go to next for /R iteration; otherwise, the file is moved but only if the destination does not yet exist;
finally, a log string is generated and returned by echo, which is then redirected to the specified log file;
for modifying variable values (PARENT, YEAR) and using them within a loop structure (or also , other code blocks), delayed expansion is required;

Speed up my batch file parsing

I have a batch file that takes input from a txt file that looks like this..
Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.
Server name lak-print01
Printer name Microsoft XPS Document Writer
Share name
Driver name Microsoft XPS Document Writer
Port name XPSPort:
Comment
Location
Print processor WinPrint
Data type RAW
Parameters
Attributes 64
Priority 1
Default priority 1
Average pages per minute 0
Printer status Idle
Extended printer status Unknown
Detected error state Unknown
Extended detected error state Unknown
Server name lak-print01
Printer name 4250_Q1
Share name 4250_Q1
Driver name Canon iR5055/iR5065 PCL5e
Port name IP_192.168.202.84
Comment Audit Department in Lakewood Operations
Location Operations Center
Print processor WinPrint
Data type RAW
Parameters
Attributes 10826
Priority 1
Default priority 0
Average pages per minute 0
Printer status Idle
Extended printer status Unknown
Detected error state Unknown
Extended detected error state Unknown
Server name lak-print01
Printer name 3130_Q1
Share name 3130_Q1
Driver name Canon iR1020/1024/1025 PCL5e
Port name IP_192.168.202.11
Comment Canon iR1025
Location Operations Center
Print processor WinPrint
Data type RAW
Parameters
Attributes 10824
Priority 1
Default priority 0
Average pages per minute 0
Printer status Idle
Extended printer status Unknown
Detected error state Unknown
Extended detected error state Unknown
and parses it to get certain things in the list, like server name, printer name, driver name, etc.. and then puts each block entry into its own comma deliminated row. So i can have multiple rows, each one for a block of text, which each column having the particular information. Some of these txt files have 100+ entries. When it gets to parsing, each file I try to parse takes 5-10 minutes
The Parse code is as follows.
:Parselak-print01
SETLOCAL enabledelayedexpansion
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
(FOR /f "delims=" %%a IN (lak-print01.txt) DO CALL :analyse "%%a")>lak-print01.csv
attrib +h lak-print01.csv
GOTO :EOF
:analyse
SET "line=%~1"
SET /a fieldnum=0
FOR %%s IN ("Server name" "Printer name" "Driver name"
"Port name" "Location" "Comment" "Printer status"
"Extended detected error state") DO CALL :setfield %%~s
GOTO :eof
:setfield
SET /a fieldnum+=1
SET "linem=!line:*%* =!"
SET "linet=%* %linem%"
IF "%linet%" neq "%line%" GOTO :EOF
IF "%linem%"=="%line%" GOTO :EOF
SET "$%fieldnum%=%linem%"
IF NOT DEFINED $8 GOTO :EOF
SET "line="
FOR /l %%q IN (1,1,7) DO SET "line=!line!,!$%%q!"
ECHO !line:~1!
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
GOTO :eof
and the output I get is
lak-print01,Microsoft XPS Document Writer,Microsoft XPS Document Writer,XPSPort:,,,Idle
lak-print01,4250_Q1,Canon iR5055/iR5065 PCL5e,IP_192.168.202.84,Operations Center,Audit Department in Lakewood Operations,Idle
lak-print01,3130_Q1,Canon iR1020/1024/1025 PCL5e,IP_192.168.202.11,Operations Center,Canon iR1025 ,Idle
lak-print01,1106_TRN,HP LaserJet P2050 Series PCL6,IP_172.16.10.97,Monroe,HP P2055DN,Idle
lak-print01,1101_TRN,HP LaserJet P2050 Series PCL6,IP_10.3.3.22,Burlington,Training Room printer,Idle
lak-print01,1096_Q3,Canon iR1020/1024/1025 PCL5e,IP_192.168.96.248,Silverdale,Canon iR 1025,Idle
lak-print01,1096_Q2,Kyocera Mita KM-5035 KX,IP_192.168.96.13,Silverdale,Kyocera CS-5035 all in one,Idle
lak-print01,1096_Q1,HP LaserJet P4010_P4510 Series PCL 6,IP_192.168.96.12,Silverdale,HP 4015,Idle
lak-print01,1095_Q3,HP LaserJet P4010_P4510 Series PCL 6,IP_192.168.95.247,Sequim,HP LaserJet 4015x,Idle
Everything is perfect, and the code works as intended.. but its just super freaking slow!
How do I speed this up? the problem is there is no true delim and the tokens vary.. for instance comment needs token 2, but printer name, needs token 3.
Any help to increase the speed of parsing.. the program works perfectly, but super slow during parsing.
If speed is what you need, I'd suggest Marpa, a general BNF parser, in Perl — code, output.
It would take some time to get used to, but does the job and gives you a very powerful tool you can use easily — note how natural the grammar resembles the input.
Hope this helps.
Using Call is very slow - see if this gives you the output you need, and it will be interesting to hear how much quicker it is in comparison.
#echo off
:Parselak-print01
SETLOCAL enabledelayedexpansion
(FOR /f "delims=" %%a IN (lak-print01.txt) DO (
for /f "tokens=1,2,*" %%b in ("%%a") do (
if "%%b"=="Server" set "server=%%d"
if "%%b"=="Printer" if "%%c"=="name" (set "printer=%%d") else (set "printerstatus=%%d")
if "%%b"=="Driver" set "driver=%%d"
if "%%b"=="Port" set "port=%%d"
if "%%b"=="Location" for /f "tokens=1,*" %%e in ("%%a") do set "location=%%f"
if "%%b"=="Comment" for /f "tokens=1,*" %%e in ("%%a") do set "comment=%%f"
if "%%b"=="Extended" for /f "tokens=1-4,*" %%e in ("%%a") do if "%%f"=="detected" set "extendeddetected=%%i"
)
if defined extendeddetected (
echo !server!,!printer!,!driver!,!port!,!location!,!comment!,!printerstatus!,!extendeddetected!
set "server="
set "printer="
set "driver="
set "port="
set "location="
set "comment="
set "printerstatus="
set "extendeddetected="
)
))>lak-print01.csv
attrib +h lak-print01.csv
pause
The solution below assume that the input file have a fixed format, that is, that it has two header lines followed by blocks of 18 lines placed always in the same order. If this is true, this solution generate the output in a very fast way; otherwise, it must be modified accordingly...
#echo off
setlocal EnableDelayedExpansion
rem Create the array of variable names for the *desired rows* of data in the file
set "row[1]=Server name"
set "row[2]=Printer name"
set "row[4]=Driver name"
set "row[5]=Port name"
set "row[6]=Comment"
set "row[7]=Location"
set "row[15]=Printer status"
set i=0
(for /F "skip=2 delims=" %%a in (lak-print01.txt) do (
set /A i+=1
if defined row[!i!] (
set "line=%%a"
for %%i in (!i!) do for /F "delims=" %%v in ("!row[%%i]!") do set "%%v=!line:*%%v =!"
)
if !i! equ 18 (
echo !Server name!,!Printer name!,!Driver name!,!Port name!,!Location!,!Comment!,!Printer status!
set i=0
)
)) > lak-print01.csv

Need help writing a batch file to read a MS Access .ldb lock file with null delimiters

I am trying to create a batch file to read a Microsoft Access .ldb lock file. The lock file contains a list of computer names and user names. I want to extract the computer names and eventually run them against an external command.
The format of the batch file is a single row with
(1) a computer name
(2) a NULL character (Hex 00)
(3) approximately 20 spaces
(4) the user name
(5) a NULL character
(6) approximately 20 spaces
repeating.
Example in Notepad++ with (NUL) representing Hex 00:
COMPUTER0123(NUL) Admin(NUL) COMPUTER0507(NUL) Admin(NUL)
I've tried several methods using FOR to read the file but can't get past the first computer name.
setlocal EnableDelayedExpansion
set file=database.ldb
for /F %%a in ('type %file%') do (
echo %%a
)
For for most of my Access databases, the user name in the file is Admin. I've been able to use FIND to tell me how many occurrences of "Admin" are in the file (plus 1).
for /f "delims=" %%n in ('find /c /v "Admin" %file%') do set "len=%%n"
set "len=!len:*:=!"
echo %len% (minus 1) computer names to process
<%file% (
for /l %%l in (1 1 !len!) do (
set "line="
set /p "line="
echo(!line!)
)
)
Iterating through the found lines doesn't work, probably because there only is one line in the file (no carriage returns).
I would like to find a solution that would work with a standard install of Windows XP.
After receiving an accepted answer, I combined that into a batch file that I'm posting below. I named the file ShowUsersInLDB.bat and put it in my SendTo folder.
#echo off
::===================================================================
:: Put this in your SendTo folder and it will let you right-click
:: on an Access .ldb/.laccdb lock file and tell you the computer
:: names that have opened the database.
::
:: After the computer names are shown, this will prompt you to
:: search for the user names associated with each computer. This
:: depends upon finding a 3rd party file named NetUsers.exe in
:: the user profile folder. Feel free to change the path if you
:: want to store the file in another location.
::
:: NetUsers.exe can be downloaded from here: http://www.optimumx.com/downloads.html#NetUsers
::
:: Notes:
:: 1) Keep in mind that sometimes after people leave the database
:: the lock file still shows their computer name. Don't jump
:: to conclusions.
:: 2) NetUsers.exe seems to report all users who have logged on
:: to the computer and not logged off, including services.
:: If you aren't familiar with your user names or your users are
:: sharing remote desktops/Citrix/Terminal Services, you may have
:: to guess who might have created the lock entry.
::
:: Installation:
:: You may find a batch file named Install_UsersInLDB.bat that will
:: copy this file to the SendTo folder and the NetUsers.exe file to
:: the user profile (or a place you define).
::
:: Ben Sacherich - March 2014
:: Please let me know if you have any ideas for improvements.
::===================================================================
setlocal
set file="%1"
:: Make sure the file has a compatible extension.
if "%~x1"==".ldb" goto :ExtensionIsValid
if "%~x1"==".laccdb" goto :ExtensionIsValid
echo.
echo "%~n1%~x1" is not the correct file type.
echo.
pause
goto :End
:ExtensionIsValid
echo The Access "%~n1%~x1" file contains
echo the following computer names:
echo.
set "compNameLine=1"
for /f %%A in ('more "%file%"') do (
if defined compNameLine (
echo %%A
set "compNameLine="
) else set "compNameLine=1"
)
echo.
echo Are you ready to look up the user names on each computer?
pause
set "compNameLine=1"
for /f %%A in ('more "%file%"') do (
if defined compNameLine (
::echo %%A
"%userprofile%\netusers" \\%%A
set "compNameLine="
) else set "compNameLine=1"
)
echo.
echo -- Validation finished at %time%
pause
:End
exit
CMD.EXE generally does not play nicely with NUL bytes. But there are a few external commands that can handle NUL bytes.
You also have to worry about the length of the "line". CMD.EXE does not like lines longer than 8191 bytes long.
I think your best bet is MORE since it converts NULs into new lines.
The following should echo your computer names.
#echo off
setlocal
set "file=database.ldb"
set "compNameLine=1"
for /f %%A in ('more "%file%"') do (
if defined compNameLine (
echo %%A
set "compNameLine="
) else set "compNameLine=1"
)

Extracting certain characters from the last line of text file using a .bat file

What I'm trying to accomplish here is to pull data from the last line of this file ftp://ftp.nhc.noaa.gov/atcf/tcweb/invest_al902012.invest. I've managed to download it and save it as a script.txt file through a .bat file. I now want to extract the latitude(13.5N) and longitude(27.2W) as well as pressure(1009) from the last line of the file and write it to a new file.I then used this code to do part of what I want:
#echo off
setlocal EnableDelayedExpansion
for /f "delims=" %%x in (script.txt) do (
set "previous=!last!"
set "last=%%x"
)
echo !previous!>> "test3.txt"
for /f "delims=*" %%x in (test3.txt) do (
set line=%%x
set chars=!line:~35,-125!
echo !chars!>> "test.txt"
)
I'm illiterate when it comes to batch coding. This is probably extremely inefficient and only extracts the latitude part of the code I want. The file will always contain the same amount of characters in the last line so I'm thinking I'm just not grasping the concept of the !line part of the code. Any help is greatly appreciated.
The file is comma delimited, so it is probably easier to let FOR /F parse the line into tokens and keep just the ones you want.
This really simple solution parses and sets values for each line, but only the last line is remembered. The performance should be fine as long as the file never becomes huge.
#echo off
for /f "tokens=7,8,10 delims=," %%A in (script.txt) do (
set lat=%%A
set long=%%B
set pres=%%C
)
echo latitude=%lat%, longitude=%long%, pressure=%pres%
If you want to strip off the spaces, then you could simply use search and replace.
echo latitude=%lat: =%, longitude=%long: =%, pressure=%pres: =%
I do not simply include space as a delimiter in the FOR /F statement because that can throw off the token counting when a value is sometimes blank and sometimes not.

Correct word-count of a LaTeX document

I'm currently searching for an application or a script that does a correct word count for a LaTeX document.
Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include and \input links to produce a correct word-count for the whole document.
With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.
Does anyone know of any script (or application) that can do this job?
I use texcount. The webpage has a Perl script to download (and a manual).
It will include tex files that are included (\input or \include) in the document (see -inc), supports macros, and has many other nice features.
When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:
TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19
If you're only interested in the total, use the -total argument.
I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:
pdftotext file.pdf - | wc - w
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w
should give you a fairly accurate word count.
To add to #aioobe,
If you use pdflatex, just do
pdftops file.pdf
ps2ascii file.ps|wc -w
I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext produced a text with 1700+ words. texcount did not include the references and produced 1088 words. ps2ascii returned 1603 words. 4 more than in Word.
I say that's a pretty good count. I am not sure where's the 4 word difference, though. :)
In Texmaker interface you can get the word count by right clicking in the PDF preview:
Overleaf has a word count feature:
Overleaf v2:
Overleaf v1:
I use the following VIM script:
function! WC()
let filename = expand("%")
let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
let result = system(cmd)
echo result . " words"
endfunction
… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?
The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.
Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.
If the use of a vim plugin suits you, the vimtex plugin has integrated the texcount tool quite nicely.
Here is an excerpt from their documentation:
:VimtexCountLetters Shows the number of letters/characters or words in
:VimtexCountWords the current project or in the selected region. The
count is created with `texcount` through a call on
the main project file similar to: >
texcount -nosub -sum [-letter] -merge -q -1 FILE
<
Note: Default arguments may be controlled with
|g:vimtex_texcount_custom_arg|.
Note: One may access the information through the
function `vimtex#misc#wordcount(opts)`, where
`opts` is a dictionary with the following
keys (defaults indicated): >
'range' : [1, line('$')]
'count_letters' : 0/1
'detailed' : 0
<
If `detailed` is 0, then it only returns the
total count. This makes it possible to use for
e.g. statusline functions. If the `opts` dict
is not passed, then the defaults are assumed.
*VimtexCountLetters!*
*VimtexCountWords!*
:VimtexCountLetters! Similar to |VimtexCountLetters|/|VimtexCountWords|, but
:VimtexCountWords! show separate reports for included files. I.e.
presents the result of: >
texcount -nosub -sum [-letter] -inc FILE
<
*VimtexImapsList*
*<plug>(vimtex-imaps-list)*
The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.
For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F (Command+F on Mac) and then, with regex enabled, search for
(^|\s+|"|((h|f|te){)|\()\w+
which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and \usepackage declarations, while including quotations and parentheticals. It also counts footnotes and \emphasized text and will count \hyperref links as one word. It's not perfect, but it's typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn't a regular language. Just thought I'd throw this up here.

Resources