Hold apa_table position in Rmarkdown/papaja - latex

I do have another Rmarkdown/papaja package question from my side and would be very happy if anyone was willing to help :)
As many people on the internet I have trouble controlling the position of my tables.
I have the
floatsintext: yes
option included in the YAML header.
I know that there are LATEX options such as fig.pos = "!H" for which I have to load the float package
header-includes:
- usepackage\{float}
However, doing this I get the following error message:
! LaTeX Error: Unknown float option `H'.
Which tells me the float package was not able to load I guess?
(I do hav MacTex installed and also the newest R version).
What also is absolutely confusing to me is that tables (generated with app_table()) appear in the (approximately) right position when I leave the default spacing, but only appear at the end of my work when I add
header-includes:
- \usepackage{setspace}
- \AtBeginEnvironment{tabular}{\singlespacing}
- \AtBeginEnvironment{lltable}{\singlespacing}
- \AtBeginEnvironment{tablenotes}{\singlespacing}
to the YAML header in order to control the spacing of my tables.
I would really appreciate any help!
Thanks in advance!
Edit:
I don't know if this serves the purpose, but if I create the following options, my table appears at the end (and not where it should be)
title : "TITLE"
shorttitle : "short title"
author:
- name : "me"
affiliation : "1"
corresponding : yes # Define only one corresponding author
address : "x"
email : "y"
role: # Contributorship roles (e.g., CRediT, https://casrai.org/credit/)
- Conceptualization
- Writing - Original Draft Preparation
- Writing - Review & Editing
# - name : "Ernst-August Doelle"
# affiliation : "1,2"
# role:
# - Writing - Review & Editing
affiliation:
- id : "1"
institution : ""
# - id : "2"
# institution : "Konstanz Business School"
authornote: |
Enter author note here.
abstract: |
keywords : "keywords"
wordcount : "X"
bibliography :
floatsintext : yes
figurelist : no
tablelist : no
footnotelist : no
linenumbers : no
mask : no
draft : no
csl: "apa.csl"
documentclass : "apa7"
classoption : "man"
output : papaja::apa6_pdf
toc: true
header-includes:
- \usepackage{float}
- \usepackage{setspace}
- \AtBeginEnvironment{tabular}{\singlespacing}
- \AtBeginEnvironment{lltable}{\singlespacing}
- \AtBeginEnvironment{tablenotes}{\singlespacing}
---
{r setup, include = FALSE}
library("papaja")
library("apa")
library("tidyverse")
library("apaTables")
r_refs("r-references.bib")
{r analysis-preferences}
# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
{r}
cor_table <- apa.cor.table(iris)
Text BLABLABLABLA
{r tab, results = "asis", fig.pos = "!h"}
apa_table(cor_table$table.body,
caption = "Means, standard deviations, and correlations with confidence intervals for study variables.",
note = "Note. M and SD are used to represent mean and standard deviation, respectively.Values in square brackets indicate the 95% confidence interval.The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.", font_size = "footnotesize", row.names = F,
placement = "p")
# Methods
We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study. <!-- 21-word solution (Simmons, Nelson & Simonsohn, 2012; retrieved from http://ssrn.com/abstract=2160588) -->
## Participants
## Material
## Procedure
## Data analysis
We used `r cite_r("r-references.bib")` for all our analyses.
# Results
# Discussion
\newpage
# References
\begingroup
\setlength{\parindent}{-0.5in}
\setlength{\leftskip}{0.5in}
<div id="refs" custom-style="Bibliography"></div>
\endgroup

For PDF output, there is a recommended way to customize the placement of tables created via apa_table(). (You do not have to load the float package via header includes.)
First, set the YAML header option floatsintext: yes.
Second, when creating your table with apa_table(), use the function's placement argument:
```{r tab}
apa_table(cor_table$table.body,
caption = "Means, standard deviations, and correlations with confidence intervals for study variables.",
note = "Note. M and SD are used to represent mean and standard deviation, respectively.Values in square brackets indicate the 95% confidence interval.The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, 2014). * indicates p < .05. ** indicates p < .01.", font_size = "footnotesize", row.names = F,
placement = "H")

Related

Different access methods to Pyro Paramstore give different results

I am following the Pyro introductory tutorial in forecasting, and trying to access the learned parameters after training the model, I get different results using different access methods for some of them (while getting identical results for others).
Here is the stripped-down reproducible code from the tutorial:
import torch
import pyro
import pyro.distributions as dist
from pyro.contrib.examples.bart import load_bart_od
from pyro.contrib.forecast import ForecastingModel, Forecaster
pyro.enable_validation(True)
pyro.clear_param_store()
pyro.__version__
# '1.3.1'
torch.__version__
# '1.5.0+cu101'
# import & prepare the data
dataset = load_bart_od()
T, O, D = dataset["counts"].shape
data = dataset["counts"][:T // (24 * 7) * 24 * 7].reshape(T // (24 * 7), -1).sum(-1).log()
data = data.unsqueeze(-1)
T0 = 0 # begining
T2 = data.size(-2) # end
T1 = T2 - 52 # train/test split
# define the model class
class Model1(ForecastingModel):
def model(self, zero_data, covariates):
data_dim = zero_data.size(-1)
feature_dim = covariates.size(-1)
bias = pyro.sample("bias", dist.Normal(0, 10).expand([data_dim]).to_event(1))
weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([feature_dim]).to_event(1))
prediction = bias + (weight * covariates).sum(-1, keepdim=True)
assert prediction.shape[-2:] == zero_data.shape
noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))
noise_dist = dist.Normal(0, noise_scale)
self.predict(noise_dist, prediction)
# fit the model
pyro.set_rng_seed(1)
pyro.clear_param_store()
time = torch.arange(float(T2)) / 365
covariates = torch.stack([time], dim=-1)
forecaster = Forecaster(Model1(), data[:T1], covariates[:T1], learning_rate=0.1)
So far so good; now, I want to inspect the learned latent parameters stored in Paramstore. Seems there are more than one ways to do this; using the get_all_param_names() method:
for name in pyro.get_param_store().get_all_param_names():
print(name, pyro.param(name).data.numpy())
I get
AutoNormal.locs.bias [14.585433]
AutoNormal.scales.bias [0.00631594]
AutoNormal.locs.weight [0.11947815]
AutoNormal.scales.weight [0.00922901]
AutoNormal.locs.noise_scale [-2.0719821]
AutoNormal.scales.noise_scale [0.03469057]
But using the named_parameters() method:
pyro.get_param_store().named_parameters()
gives the same values for the location (locs) parameters, but different values for all scales ones:
dict_items([
('AutoNormal.locs.bias', Parameter containing: tensor([14.5854], requires_grad=True)),
('AutoNormal.scales.bias', Parameter containing: tensor([-5.0647], requires_grad=True)),
('AutoNormal.locs.weight', Parameter containing: tensor([0.1195], requires_grad=True)),
('AutoNormal.scales.weight', Parameter containing: tensor([-4.6854], requires_grad=True)),
('AutoNormal.locs.noise_scale', Parameter containing: tensor([-2.0720], requires_grad=True)),
('AutoNormal.scales.noise_scale', Parameter containing: tensor([-3.3613], requires_grad=True))
])
How is this possible? According to the documentation, Paramstore is a simple key-value store; and there are only these six keys in it:
pyro.get_param_store().get_all_param_names() # .keys() method gives identical result
# result
dict_keys([
'AutoNormal.locs.bias',
'AutoNormal.scales.bias',
'AutoNormal.locs.weight',
'AutoNormal.scales.weight',
'AutoNormal.locs.noise_scale',
'AutoNormal.scales.noise_scale'])
so, there is no way that one method access one set of items and the other a different one.
Am I missing something here?
pyro.param() returns transformed parameters in this case to the positive reals for scales.
Here is the situation, as revealed in the Github thread I opened in parallel with this question...
Paramstore is no more just a simple key-value store - it also performs constraint transformations; quoting a Pyro developer from the above link:
here's some historical background. The ParamStore was originally just a key-value store. Then we added support for constrained parameters; this introduced a new layer of separation between user-facing constrained values and internal unconstrained values. We created a new dict-like user-facing interface that exposed only constrained values, but to keep backwards compatibility with old code we kept the old interface around. The two interfaces are distinguished in the source files [...] but as you observe it looks like we forgot to mark the old interface as DEPRECATED.
I guess in clarifying docs we should:
clarify that the ParamStore is no longer a simple key-value store
but also performs constraint transforms;
mark all "old" style interface methods as DEPRECATED;
remove "old" style interface usage from examples and tutorials.
As a consequence, it turns out that, while pyro.param() returns the results in the constrained (user-facing) space, the older method named_parameters() returns the unconstrained (i.e. for internal use only) values, hence the apparent discrepancy.
It's not difficult to verify indeed that the scales values returned by the two methods above are related by a logarithmic transformation:
import numpy as np
items = list(pyro.get_param_store().named_parameters()) # unconstrained space
i = 0
for name in pyro.get_param_store().keys():
if 'scales' in name:
temp = np.log(
pyro.param(name).item() # constrained space
)
print(temp, items[i][1][0].item() , np.allclose(temp, items[i][1][0].item()))
i+=1
# result:
-5.027793402915326 -5.0277934074401855 True
-4.600319371162187 -4.6003193855285645 True
-3.3920585732532835 -3.3920586109161377 True
Why does this discrepancy affect only scales parameters? That's because scales (i.e. essentially variances) are by definition constrained to be positive; that doesn't hold for locs (i.e. means), which are not constrained, hence the two representations coincide for them.
As a result of the question above, a new bullet has now been added in the Paramstore documentation, giving a relevant hint:
in general parameters are associated with both constrained and unconstrained values. for example, under the hood a parameter that is constrained to be positive is represented as an unconstrained tensor in log space.
as well as in the documentation of the named_parameters() method of the old interface:
Note that, in the event the parameter is constrained, unconstrained_value is in the unconstrained space implicitly used by the constraint.

R - how to exclude pennystocks from environment before calculating adjusted stock returns

Within my current research I'm trying to find out, how big the impact of ad-hoc sentiment on daily stock returns is.
Calculations functioned quite well and results also are plausible.
The calculations until now with quantmod package and yahoo financial data look like below:
getSymbols(c("^CDAXX",Symbols) , env = myenviron, src = "yahoo",
from = as.Date("2007-01-02"), to = as.Date("2016-12-30")
Returns <- eapply(myenviron, function(s) ROC(Ad(s), type="discrete"))
ReturnsDF <- as.data.table(do.call(merge.xts, Returns))
# adjust column names
colnames(ReturnsDF) <- gsub(".Adjusted","",colnames(ReturnsDF))
ReturnsDF <- as.data.table(ReturnsDF)
However, to make it more robust towards noisy influence of pennystock data I wonder, how its possible to exclude stocks that once in the time period go below a certain value x, let's say 1€.
I guess, the best thing would be to exclude them before calculating the returns and merge the xts object results or even better, before downloading them with the getSymbols command.
Has anybody an idea how this could work best? Thanks in advance.
Try this:
build a price frame of the Adj. closing prices of your symbols
(I use the PF function of the quantmod add-on package qmao which has lots of other useful functions for this type of analysis. (install.packages("qmao", repos="http://R-Forge.R-project.org”))
check by column if any price is below your minimum trigger price
select only columns which have no closings below the trigger price
To stay more flexible I would suggest to take a sub period - let’s say no price below 5 during the last 21 trading days.The toy example below may illustrate my point.
I use AAPL, FB and MSFT as the symbol universe.
> symbols <- c('AAPL','MSFT','FB')
> getSymbols(symbols, from='2018-02-01')
[1] "AAPL" "MSFT" "FB"
> prices <- PF(symbols, silent = TRUE)
> prices
AAPL MSFT FB
2018-02-01 167.0987 93.81929 193.09
2018-02-02 159.8483 91.35088 190.28
2018-02-05 155.8546 87.58855 181.26
2018-02-06 162.3680 90.90299 185.31
2018-02-07 158.8922 89.19102 180.18
2018-02-08 154.5200 84.61253 171.58
2018-02-09 156.4100 87.76771 176.11
2018-02-12 162.7100 88.71327 176.41
2018-02-13 164.3400 89.41000 173.15
2018-02-14 167.3700 90.81000 179.52
2018-02-15 172.9900 92.66000 179.96
2018-02-16 172.4300 92.00000 177.36
2018-02-20 171.8500 92.72000 176.01
2018-02-21 171.0700 91.49000 177.91
2018-02-22 172.5000 91.73000 178.99
2018-02-23 175.5000 94.06000 183.29
2018-02-26 178.9700 95.42000 184.93
2018-02-27 178.3900 94.20000 181.46
2018-02-28 178.1200 93.77000 178.32
2018-03-01 175.0000 92.85000 175.94
2018-03-02 176.2100 93.05000 176.62
Let’s assume you would like any instrument which traded below 175.40 during the last 6 trading days to be excluded from your analysis :-) .
As you can see that shall exclude AAPL and FB.
apply and the base function any applied(!) to a 6-day subset of prices will give us exactly what we want. Showing the last 3 days of prices excluding the instruments which did not meet our condition:
> tail(prices[,apply(tail(prices),2, function(x) any(x < 175.4)) == FALSE],3)
FB
2018-02-28 178.32
2018-03-01 175.94
2018-03-02 176.62

Scikit-learn: How to extract features from the text?

Assume I have an array of Strings:
['Laptop Apple Macbook Air A1465, Core i7, 8Gb, 256Gb SSD, 15"Retina, MacOS' ... 'another device description']
I'd like to extract from this description features like:
item=Laptop
brand=Apple
model=Macbook Air A1465
cpu=Core i7
...
Should I prepare the pre-defined known features first? Like
brands = ['apple', 'dell', 'hp', 'asus', 'acer', 'lenovo']
cpu = ['core i3', 'core i5', 'core i7', 'intel pdc', 'core m', 'intel pentium', 'intel core duo']
I am not sure that I need to use CountVectorizer and TfidfVectorizer here, it's more appropriate to have DictVictorizer, but how can I make dicts with keys extracting values from the entire string?
is it possible with scikit-learn's Feature Extraction? Or should I make my own .fit(), and .transform() methods?
UPDATE:
#sergzach, please review if I understood you right:
data = ['Laptop Apple Macbook..', 'Laptop Dell Latitude...'...]
for d in data:
for brand in brands:
if brand in d:
# ok brand is found
for model in models:
if model in d:
# ok model is found
So creating N-loops per each feature? This might be working, but not sure if it is right and flexible.
Yes, something like the next.
Excuse me, probably you should correct the code below.
import re
data = ['Laptop Apple Macbook..', 'Laptop Dell Latitude...'...]
features = {
'brand': [r'apple', r'dell', r'hp', r'asus', r'acer', r'lenovo'],
'cpu': [r'core\s+i3', r'core\s+i5', r'core\s+i7', r'intel\s+pdc', r'core\s+m', r'intel\s+pentium', r'intel\s+core\s+duo']
# and other features
}
cat_data = [] # your categories which you should convert into numbers
not_found_columns = []
for line in data:
line_cats = {}
for col, features in features.iteritems():
for i, feature in enumerate(features):
found = False
if re.findall(feature, line.lower(), flags=re.UNICODE) != []:
line_cats[col] = i + 1 # found numeric category in column. For ex., for dell it's 2, for acer it's 5.
found = True
break # current category is determined by a first occurence
# cycle has been end but feature had not been found. Make column value as default not existing feature
if not found:
line_cats[col] = 0
not_found_columns.append((col, line))
cat_data.append(line_cats)
# now we have cat_data where each column is corresponding to a categorial (index+1) if a feature had been determined otherwise 0.
Now you have column names with lines (not_found_columns) which was not found. View them, probably you forgot some features.
We can also write strings (instead of numbers) as categories and then use DV. In result the approaches are equivalent.
Scikit Learn's vectorizers will convert an array of strings to an inverted index matrix (2d array, with a column for each found term/word). Each row (1st dimension) in the original array maps to a row in the output matrix. Each cell will hold a count or a weight, depending on which kind of vectorizer you use and its parameters.
I am not sure this is what you need, based on your code. Could you tell where you intend to use this features you are looking for? Do you intend to train a classifier? To what purpose?

Improve accuracy of underlined text

Some underlines in my image are very close to the text. For that particular text tesseract is unable to produce accurate results. I have attached the image and text file. Is there any way i can increase accuracy of the text?
I have tried to remove the underlines with some of the image processing techniques, but the problem is those lines which are close to the text are not getting removed.
And are there any parameters in tesseract which i can use to improve the accuracy? Thanks in advance.
image which i am trying to run
Its Result:
ARR!
D.
1.
\OCIJHJO‘LI'IJ?‘
3..
10.
E.
F.
SITE NUMBER
ARCHEOLOGICAL DESCRIPTION
General site description SITE IS COVERED WITH LARGE PINES AND IS IN RELATIVLY
GOOD CONDITION, snowING'EITTrE‘SIGNS‘OFTRosmN—EXCEPT—AEONG-Tmm
_"—""NHERE IT DROPS or INTO FLOODPLAIN OF CREEK THERE ARE A EEN ANIMAL TRAILS THAT
HAVE APPARENTLY ERODED OUT IN THE PAST. ONE OF THESE WAS QUIET DEEP ACCORDING
“TO AUGER TEST, BUT HAS FILLED UP WITH SAND AND GROWN OVER AGAIN. FIRST AUGER TEST
“WAS INTO THIS DEE P"GULLY" AND GAVE A FALSE IMPRESSION AS TO THE TRUE DEPTH OF
SITE. THIS TEST HOLE PRODUCED LIEHLQ FLAKES ALL THE WAY DOWN TO 42 INCHES AND
_m STERILE SAND DOWN TO 60 INCHES= REST OF SITE PRODUCED SAND AND CHIPS ONLY TO
l- an I ' A: : I L I i : ‘5!) THIS 3 1.0 5.- 3.. 'Y __
FINE SITE.
Site size .AT L S - E Y CONSIDERABLY MQBE
Nature of archeological deposition EAIBIEIHNDESTURBED EXCEPT ALONG THE EDGES OF SITE
T D0.
Site depth. 20-22 INCHES
Hidden
Faunal preservation
Floral preservation
Human remains
Cultural features (type and number)
Charcoal preservation
DATA RECOVERY METHODS
Ground surface visibility: 0% x 1-251 26—50% 51-75% 76—100%
Description of ground cover iMATURE PINE FOREST
Time spent collecting Number of peeple collecting
Description of surface collecting methods
Type and extent of testing and/or excavation FIVE TEST HOLES WERE SUNK IN SITE WITH 8"
AUGERa THESE WERE TAKEN DOWN IN 6" LEVELS UNTIL STERILE CLAY WAS REACHED. DIRTTA T-
FROM EACH 6" LEVEL WAS SCREENED THROUGH_l/4" WIRE MESH AND ARTIFACTS KEPT FOR
ANALYSIS. ALL TEST HOLES QERE PLQIIED EIIE TRANSIT IN RELATION TO DATUM MARKER
WHI IS A PIPE ‘ _ -: fl' : 3:0. . .: U' J I: : : . !" uFF 3L
GROUND. P__\l: IS I : um \I' :i “I ' I ' .M' I ' D' . I’ I 2! ti 0 .1. ' -. _ .L l' .
ARCHEOLOGICAL COMPONENTS
Paleo-Indian Late Whodland 17th century
Early Archaic Mississippian 18th century
Middle Archaic Late prehistoric 19th century
Late Archaic Unknown prehistoric ___ 20th century __
Early Woodland Ceramic prehistoric ____ Unknown historic
Middle Woodland 16th century

How do I transform text into TF-IDF format using Weka in Java?

Suppose, I have following sample ARFF file with two attributes:
(1) sentiment: positive [1] or negative [-1]
(2) tweet: text
#relation sentiment_analysis
#attribute sentiment {1, -1}
#attribute tweet string
#data
-1,'is upset that he can\'t update his Facebook by texting it... and might cry as a result School today also. Blah!'
-1,'#Kenichan I dived many times for the ball. Managed to save 50\% The rest go out of bounds'
-1,'my whole body feels itchy and like its on fire '
-1,'#nationwideclass no, it\'s not behaving at all. i\'m mad. why am i here? because I can\'t see you all over there. '
-1,'#Kwesidei not the whole crew '
-1,'Need a hug '
1,'#Cliff_Forster Yeah, that does work better than just waiting for it In the end I just wonder if I have time to keep up a good blog.'
1,'Just woke up. Having no school is the best feeling ever '
1,'TheWDB.com - Very cool to hear old Walt interviews! ? http://blip.fm/~8bmta'
1,'Are you ready for your MoJo Makeover? Ask me for details '
1,'Happy 38th Birthday to my boo of alll time!!! Tupac Amaru Shakur '
1,'happy #charitytuesday #theNSPCC #SparksCharity #SpeakingUpH4H '
I want to convert the values of second attribute into equivalent TF-IDF values.
Btw, I tried following code but its output ARFF file doesn't contain first attribute for positive(1) values for respective instances.
// Set the tokenizer
NGramTokenizer tokenizer = new NGramTokenizer();
tokenizer.setNGramMinSize(1);
tokenizer.setNGramMaxSize(1);
tokenizer.setDelimiters("\\W");
// Set the filter
StringToWordVector filter = new StringToWordVector();
filter.setAttributeIndicesArray(new int[]{1});
filter.setOutputWordCounts(true);
filter.setTokenizer(tokenizer);
filter.setInputFormat(inputInstances);
filter.setWordsToKeep(1000000);
filter.setDoNotOperateOnPerClassBasis(true);
filter.setLowerCaseTokens(true);
filter.setTFTransform(true);
filter.setIDFTransform(true);
// Filter the input instances into the output ones
outputInstances = Filter.useFilter(inputInstances, filter);
Sample output ARFF file:
#data
{0 -1,320 1,367 1,374 1,397 1,482 1,537 1,553 1,681 1,831 1,1002 1,1033 1,1112 1,1119 1,1291 1,1582 1,1618 1,1787 1,1810 1,1816 1,1855 1,1939 1,1941 1}
{0 -1,72 1,194 1,436 1,502 1,740 1,891 1,935 1,1075 1,1256 1,1260 1,1388 1,1415 1,1579 1,1611 1,1818 2,1849 1,1853 1}
{0 -1,374 1,491 1,854 1,873 1,1120 1,1121 1,1197 1,1337 1,1399 1,2019 1}
{0 -1,240 1,359 2,369 1,407 1,447 1,454 1,553 1,1019 1,1075 3,1119 1,1240 1,1244 1,1373 1,1379 1,1417 1,1599 1,1628 1,1787 1,1824 1,2021 1,2075 1}
{0 -1,198 1,677 1,1379 1,1818 1,2019 1}
{0 -1,320 1,1070 1,1353 1}
{0 -1,210 1,320 2,477 2,867 1,1020 1,1067 1,1075 1,1212 1,1213 1,1240 1,1373 1,1404 1,1542 1,1599 1,1628 1,1815 1,1847 1,2067 1,2075 1}
{179 1,1815 1}
{298 1,504 1,662 1,713 1,752 1,1163 1,1275 1,1488 1,1787 1,2011 1,2075 1}
{144 1,785 1,1274 1}
{19 1,256 1,390 1,808 1,1314 1,1350 1,1442 1,1464 1,1532 1,1786 1,1823 1,1864 1,1908 1,1924 1}
{84 1,186 1,320 1,459 1,564 1,636 1,673 1,810 1,811 1,966 1,997 1,1094 1,1163 1,1207 1,1592 1,1593 1,1714 1,1836 1,1853 1,1964 1,1984 1,1997 2,2058 1}
{9 1,1173 1,1768 1,1818 1}
{86 1,935 1,1112 1,1337 1,1348 1,1482 1,1549 1,1783 1,1853 1}
As you can see that first few instances are okay(as they contains -1 class along with other features), but the last remaining instances don't contain positive class attribute(1).
I mean, there should have been {0 1,...} as very first attribute in the last instances in output ARFF file, but it is missing.
You have to specify which is your class attribute explicitly in the java program as, when you apply StringToWordVector filter your input gets divided among specified n-grams. Hence class attribute location changes once StringToWordVector vectorizes the input. You can just use Reorder filer which will ultimately place class attribute at last position and Weka will pick last attribute as class attribute.
More info about Reordering in Weka can be found at http://weka.sourceforge.net/doc.stable-3-8/weka/filters/unsupervised/attribute/Reorder.html. Also example 5 at http://www.programcreek.com/java-api-examples/index.php?api=weka.filters.unsupervised.attribute.Reorder may help you in doing reordering.
Hope it helps.
Your process for obtaining TF-IDF seems correct.
According to my experiments, if you have n classes, Weka shows information labels for records for each n-1 classes and records for the nth class are implied.
In your case, you have 2 classes -1 and 1, so weka is showing labels in records with class label -1 and records with label 1 are implied.

Resources