Planet Primates

October 07, 2016

Planet Theory

Linear algebraic structure of word meanings

Word embeddings capture the meaning of a word using a low-dimensional vector and are ubiquitous in natural language processing (NLP). (See my earlier post 1 and post2.) It has always been unclear how to interpret the embedding when the word in question is polysemous, that is, has multiple senses. For example, tie can mean an article of clothing, a drawn sports match, and a physical action.

Polysemy is an important issue in NLP and much work relies upon WordNet, a hand-constructed repository of word senses and their interrelationships. Unfortunately, good WordNets do not exist for most languages, and even the one in English is believed to be rather incomplete. Thus some effort has been spent on methods to find different senses of words.

In this post I will talk about my joint work with Li, Liang, Ma, Risteski which shows that actually word senses are easily accessible in many current word embeddings. This goes against conventional wisdom in NLP, which is that of course, word embeddings do not suffice to capture polysemy since they use a single vector to represent the word, regardless of whether the word has one sense, or a dozen. Our work shows that major senses of the word lie in linear superposition within the embedding, and are extractable using sparse coding.

This post uses embeddings constructed using our method and the wikipedia corpus, but similar techniques also apply (with some loss in precision) to other embeddings described in post 1 such as word2vec, Glove, or even the decades-old PMI embedding.

A surprising experiment

Take the viewpoint –simplistic yet instructive– that a polysemous word like tie is a single lexical token that represents unrelated words tie1, tie2, … Here is a surprising experiment that suggests that the embedding for tie should be approximately a weighted sum of the (hypothethical) embeddings of tie1, tie2, …

Take two random words $w_1, w_2$. Combine them into an artificial polysemous word $w_{new}$ by replacing every occurrence of $w_1$ or $w_2$ in the corpus by $w_{new}.$ Next, compute an embedding for $w_{new}$ using the same embedding method while deleting embeddings for $w_1, w_2$ but preserving the embeddings for all other words. Compare the embedding $v_{w_{new}}$ to linear combinations of $v_{w_1}$ and $v_{w_2}$.

Repeating this experiment with a wide range of values for the ratio $r$ between the frequencies of $w_1$ and $w_2$, we find that $v_{w_{new}}$ lies close to the subspace spanned by $v_{w_1}$ and $v_{w_2}$: the cosine of its angle with the subspace is on average $0.97$ with standard deviation $0.02$. Thus $v_{w_{new}} \approx \alpha v_{w_1} + \beta v_{w_2}$. We find that $\alpha \approx 1$ whereas $\beta \approx 1- c\lg r$ for some constant $c\approx 0.5$. (Note this formula is meaningful when the frequency ratio $r$ is not too large, i.e. when $ r < 10^{1/c} \approx 100$.) Thanks to this logarithm, the infrequent sense is not swamped out in the embedding, even if it is 50 times less frequent than the dominant sense. This is an important reason behind the success of our method for extracting word senses.

This experiment –to which we were led by our theoretical investigations– is very surprising because the embedding is the solution to a complicated, nonconvex optimization, yet it behaves in such a striking linear way. You can read our paper for an intuitive explanation using our theoretical model from post2.

Extracting word senses from embeddings

The above experiment suggests that

but this alone is insufficient to mathematically pin down the senses, since $v_{tie}$ can be expressed in infinitely many ways as such a combination. To pin down the senses we will interrelate the senses of different words —for example, relate the “article of clothing” sense tie1 with shoe, jacket etc.

The word senses tie1, tie2,.. correspond to “different things being talked about” —in other words, different word distributions occuring around tie. Now remember that our earlier paper described in post2 gives an interpretation of “what’s being talked about”: it is called discourse and it is represented by a unit vector in the embedding space. In particular, the theoretical model of post2 imagines a text corpus as being generated by a random walk on discourse vectors. When the walk is at a discourse $c_t$ at time $t$, it outputs a few words using a loglinear distribution:

One imagines there exists a “clothing” discourse that has high probability of outputting the tie1 sense, and also of outputting related words such as shoe, jacket, etc. Similarly there may be a “games/matches” discourse that has high probability of outputting tie2 as well as team, score etc.

By equation (2) the probability of being output by a discourse is determined by the inner product, so one expects that the vector for “clothing” discourse has high inner product with all of shoe, jacket, tie1 etc., and thus can stand as surrogate for $v_{tie1}$ in expression (1)! This motivates the following global optimization:

Given word vectors in $\Re^d$, totaling about $60,000$ in this case, a sparsity parameter $k$, and an upper bound $m$, find a set of unit vectors $A_1, A_2, \ldots, A_m$ such that where at most $k$ of the coefficients $\alpha_{w,1},\dots,\alpha_{w,m}$ are nonzero (so-called hard sparsity constraint), and $\eta_w$ is a noise vector.

Here $A_1, \ldots A_m$ represent important discourses in the corpus, which we refer to as atoms of discourse.

Optimization (3) is a surrogate for the desired expansion of $v_{tie}$ in (1) because one can hope that the atoms of discourse will contain atoms corresponding to clothing, sports matches etc. that will have high inner product (close to $1$) with tie1, tie2 respectively. Furthermore, restricting $m$ to be much smaller than the number of words ensures that each atom needs to be used for multiple words, e.g., reuse the “clothing” atom for shoes, jacket etc. as well as for tie.

Both $A_j$’s and $\alpha_{w,j}$’s are unknowns in this optimization. This is nothing but sparse coding, useful in neuroscience, image processing, computer vision, etc. It is nonconvex and computationally NP-hard in the worst case, but can be solved quite efficiently in practice using something called the k-SVD algorithm described in Elad’s survey, lecture 4. We solved this problem with sparsity $k=5$ and using $m$ about $2000$. (Experimental details are in the paper. Also, some theoretical analysis of such an algorithm is possible; see this earlier post.)

Experimental Results

Each discourse atom defines via (2) a distribution on words, which due to the exponential appearing in (2) strongly favors words whose embeddings have a larger inner product with it. In practice, this distribution is quite concentrated on as few as 50-100 words, and the “meaning” of a discourse atom can be roughly determined by looking at a few nearby words. This is how we visualize atoms in the figures below. The first figure gives a few representative atoms of discourse.

A few of the 2000 atoms of discourse found

And here are the discourse atoms used to represent two polysemous words, tie and spring

Discourse atoms expressing the words tie and spring.

You can see that the discourse atoms do correspond to senses of these words.

Finally, we also have a technique that, given a target word, generates representative sentences according to its various senses as detected by the algorithm. Below are the sentences returned for ring. (N.B. The mathematical meaning was missing in WordNet but was picked up by our method.)

Representative sentences for different senses of the word ring.

A new testbed for testing comprehension of word senses

Many tests have been proposed to test an algorithm’s grasp of word senses. They often involve hard-to-understand metrics such as distance in WordNet, or sometimes tied to performance on specific applications like web search.

We propose a new simple test –inspired by word-intrusion tests for topic coherence due to Chang et al 2009– which has the advantages of being easy to understand, and can also be administered to humans.

We created a testbed using 200 polysemous words and their 704 senses according to WordNet. Each “sense” is represented by a set of 8 related words; these were collected from WordNet and online dictionaries by college students who were told to identify most relevant other words occurring in the online definitions of this word sense as well as in the accompanying illustrative sentences. These 8 words are considered as ground truth representation of the word sense: e.g., for the “tool/weapon” sense of axe they were: handle, harvest, cutting, split, tool, wood, battle, chop.

Police line-up test for word senses: the algorithm is given a random one of these 200 polysemous words and a set of $m$ senses which contain the true sense for the word as well as some distractors, which are randomly picked senses from other words in the testbed. The test taker has to identify the word’s true senses amont these $m$ senses.

As usual, accuracy is measured using precision (what fraction of the algorithm/human’s guesses were correct) and recall (how many correct senses were among the guesses).

For $m=20$ and $k=4$, our algorithm succeeds with precision $63\%$ and recall $70\%$, and performance remains reasonable for $m=50$. We also administered the test to a group of grad students. Native English speakers had precision/recall scores in the $75$ to $90$ percent range. Non-native speakers had scores roughly similar to our algorithm.

Our algorithm works something like this: If $w$ is the target word, then take all discourse atoms computed for that word, and compute a certain similarity score between each atom and each of the $m$ senses, where the words in the senses are represented by their word vectors. (Details are in the paper.)


Word embeddings have been useful in a host of other settings, and now it appears that they also can easily yield different senses of a polysemous word. We have some subsequent applications of these ideas to other previously studied settings, including topic models, creating WordNets for other languages, and understanding the semantic content of fMRI brain measurements. I’ll describe some of them in future posts.

October 07, 2016 01:00 PM

August 24, 2016


Nordkorea hat jetzt einen Raketenabschuss aus einem ...

Nordkorea hat jetzt einen Raketenabschuss aus einem U-Boot demonstriert. Das ist ein wichtiges Signal für Nordkorea; das ist der Zeitpunkt, ab dem sich kein Land mehr leisten kann, Nordkorea anzugreifen. Das ist die Ansage: Selbst wenn ihr uns per Nuklearschlag komplett auslöscht, können wir euch noch als Rache aus unseren U-Booten plattmachen.

Gut, nun hatte ja eh niemand vor, Nordkorea auszulöschen. Aber ich vermute mal, dass die ganzen Paranoiker in Nordkorea ab jetzt besser schlafen können werden.

August 24, 2016 01:01 PM


Apply a list of Python functions in order elegantly

I have an input value val and a list of functions to be applied in the order:

funcs = [f1, f2, f3, ..., fn]

How to apply elegantly and not writing

fn( ... (f3(f2(f1(val))) ... )

and also not using for loop:

tmp = val
for f in funcs:
    tmp = f(tmp)

by Viet at August 24, 2016 12:49 PM


IR Swaps - Curve sensitivity at maturity node

I was recently trying to price some IR swaps in BBG. I noticed that when I shock the yield curve up by 1bps at a single specific node, the DV01 is close to zero except at the node nearest the maturity. Nearly 100% of the DV01 for a parallel shift comes from the shock to the node near maturity.

I don't really understand this, since I would expect every node to have similar risk, perhaps slightly increasing the further away you are.

I see this trend with every IR Swap that I look at.

Clearly I am missing some understanding of the exposure of IR Swaps, could anyone here help me?


Note: I'm looking at the combined legs in this case.

by user3365528 at August 24, 2016 12:45 PM


How to inject dependencies in Aurelia without using ES6 class feature

How do I inject dependencies when exporting a function instead a class?

by Ganbin at August 24, 2016 12:38 PM

What impact does vocabulary_size have on word2vec tensorflow implementation?

I've performed the steps this guide to generate a vector representation of words.

Now I'm using a custom dataset of 45'000 words I'm running word2vec on.

To run I modified to use my own dataset by modifying to words = read_data('')

I encountered an issue similar to and so reduced the vocabulary_size to 200 . It now runs but the results do not appear to be capturing the context. For example here is a sample output :

Nearest to Leave: Employee, it, •, due, You, appeal, Employees, which,

What can I infer from this output ? Will increasing/decreasing vocabulary_size improve results ?

I'm using python3 so to run I use python3

by blue-sky at August 24, 2016 12:36 PM

Image clustering by its similarity in python

I have a collection of photos and I'd like to distinguish clusters of the similar photos. Which features of an image and which algorithm should I use to solve my task?

by Oleksandr at August 24, 2016 12:31 PM


How to predict VaR changes on a DoD basis?

I am trying to predict change in VaR on a DoD basis. So let's say at t=0, I have my VaR based on full valuation. On t=1, I will have another VaR based on full valuation. I am trying to predict this VaR on t=1 without full valuations. I also have Composite VaR and Incremental VaR at t=0. At t=1, I will also know how my risk factors have changed and if there is any new or dropped trades in my portfolio as well.

What will be the best way to proceed? Links to any reference material will also be highly appreciated.

by Deb at August 24, 2016 12:30 PM


reinforcement learning in gridworld with subgoals

Andrew Ng, Daishi Harada, Stuart Russell published a conference paper entitled Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping.

There is a specific example there that I am extremely curious/interested about. It is in Figure 2(a) of the paper:

5x5 grid world with 5 subgoals (including goal state), which must be visit in order 1, 2, 3, 4, G

It is about a 5x5 gridworld with start and goal states in opposite corners. The catch is, the agent must learn to go from start to end BY VISITING specific subgoals 1,2,3,4 IN ORDER.

Has anyone seen/understood the code for this? I want to know how the reward function/shaping is given in this kind of problem.

I am interested to know how the flow of this modification to the grid world is written.

by cgo at August 24, 2016 12:26 PM



No touchpad on freebsd on an acer aspire11

First time using FreeBSD. Installed it on an acer aspire v11. Neither the touchpad or touchscreen work. From what I understand there should be a device /dev/psm0 that represents the touchpad, but there is none.

Searching for getting a touchpad working gives results that all seem to assume psm0 exists. Things for linux suggest setting i8042.nopnp as a kernel option, but I don't know what the equivalent here would be.

by Adam at August 24, 2016 12:15 PM


Update a field in an Elm-lang record via dot function?

Is it possible to update a field in an Elm record via a function (or some other way) without explicitly specifying the precise field name?


> fields = { a = 1, b = 2, c = 3 }
> updateField fields newVal fieldToUpdate = { fields | fieldToUpdate <- newVal }
> updateField fields 5 .a -- does not work


To add some context, I'm trying to DRY up the following code:

UpdatePhraseInput contents ->
  let currentInputFields = model.inputFields
  in { model | inputFields <- { currentInputFields | phrase <- contents }}

UpdatePointsInput contents ->
  let currentInputFields = model.inputFields
  in { model | inputFields <- { currentInputFields | points <- contents }}

Would be really nice if I could call a mythical updateInput function like this:

UpdatePhraseInput contents -> updateInput model contents .phrase
UpdatePointsInput contents -> updateInput model contents .points

by seanomlor at August 24, 2016 12:12 PM

Tensorflow Training and Validation Input Queue Separation

I tried to replicate the Fully Convolutional Network results using TensorFlow. I used Marvin Teichmann's implementation from github, I only need to write the training wrapper. I create two graphs that share variables and two input queues, one for training and one for validation. To test my training wrapper, I used two short lists of training and validation files and I do a validation immediately after every training epoch. I also printed out the shape of every image from the input queue to check whether I get the correct input. However, after I started the training, it seems that only the images from the training queue is being dequeued. So both the training and validation graphs take input from the training queue and the validation queue is never accessed. Can anyone help explain and solve this problem?

Here's part of the relevant code:

def get_data(image_name_list, num_epochs, scope_name, num_class = NUM_CLASS):
    with tf.variable_scope(scope_name) as scope:
        images_path = [os.path.join(DATASET_DIR, i+'.jpg') for i in image_name_list]
        gts_path = [os.path.join(GT_DIR, i+'.png') for i in image_name_list]
        seed = random.randint(0, 2147483647)
        image_name_queue = tf.train.string_input_producer(images_path, num_epochs=num_epochs, shuffle=False, seed = seed)
        gt_name_queue = tf.train.string_input_producer(gts_path, num_epochs=num_epochs, shuffle=False, seed = seed)
        reader = tf.WholeFileReader()
        image_key, image_value =
        my_image = tf.image.decode_jpeg(image_value)
        my_image = tf.cast(my_image, tf.float32)
        my_image = tf.expand_dims(my_image, 0)
        gt_key, gt_value =
        # gt stands for ground truth
        my_gt = tf.cast(tf.image.decode_png(gt_value, channels = 1), tf.float32)
        my_gt = tf.one_hot(tf.cast(my_gt, tf.int32), NUM_CLASS)
        return my_image, my_gt

train_image, train_gt = get_data(train_files, NUM_EPOCH, 'training')
val_image, val_gt = get_data(val_files, NUM_EPOCH, 'validation')
with tf.variable_scope('FCN16') as scope:
        train_vgg16_fcn = fcn16_vgg.FCN16VGG(), train=True, num_classes=NUM_CLASS, keep_prob = KEEP_PROB)
        val_vgg16_fcn = fcn16_vgg.FCN16VGG(), train=False, num_classes=NUM_CLASS, keep_prob = 1)
Define the loss, evaluation metric, summary, saver in the computation graph. Initialize variables and start a session.
for epoch in range(starting_epoch, NUM_EPOCH):
    for i in range(train_num):
        _, loss_value, shape =[train_op, train_entropy_loss, tf.shape(train_image)])
        print shape
    for i in range(val_num):
        loss_value, shape =[val_entropy_loss, tf.shape(val_image)])
        print shape

by Ruizhi Deng at August 24, 2016 12:12 PM


Something-Treewidth Property

Let $s$ be a graph parameter (ex. diameter, domination number, etc)

A family $\mathcal{F}$ of graphs has the $s$-treewidth property if there is a function $f$ such that for any graph $G\in \mathcal{F}$, the treewidth of $G$ is at most $f(s)$.

For instance, let $s = \mathit{diameter}$, and $\mathcal{F}$ be the family of planar graphs. Then it is known that any planar graph of diameter at most $s$ has treewidth at most $O(s)$. More generally, Eppstein showed that a family of graphs has the diameter-treewidth property if and only if it excludes some apex graph as a minor. Examples of such families are graphs of constant genus, etc.

As another example, let $s = \mathit{domination{-}number}$. Fomin and Thilikos have proved an analog result to Eppstein's by showing that a family of graphs has the domination-number-treewidth property if and only if $\mathcal{F}$ has local-treewidth. Note that this happens if and only if $\mathcal{F}$ has the diameter-treewidth property.


  1. For which graph parameters $s$ is the $s$-treewidth property known to hold on planar graphs?
  2. For which graph parameters $s$ is the $s$-treewidth property known to hold on graphs of bounded local-treewidth?
  3. Are there any other families of graphs, not comparable to graphs of bounded local-treewidth for which the $s$-treewidth property holds for some suitable parameter $s$?

I have a feeling that these questions have some relation with the theory of bidimensionality. Within this theory, there are several important parameters. For instance, the sizes of feedback vertex set, vertex cover, minimum maximal matching, face cover, dominating set, edge dominating set, R-dominating set, connected dominating set, connected edge dominating set, connected R-dominating set, etc.

  1. Does any parameter $s$ encountered in bidimensionality theory have the $s$-treewidth property for some suitable family of graphs?

by Springberg at August 24, 2016 12:05 PM


How to prove that the reversal of the concatenation of two strings is the concatenation of the reversals?

Given languages $L_1$ and $L_2$, how do we prove that $$(L_1L_2)^{\mathrm{rev}} = (L_2^{\mathrm{rev}})(L_1^{\mathrm{rev}})\,,$$ where ${}^{\mathrm{rev}}$ denotes reversal?

I think using mathematical induction, it can be shown true for two languages $L_1$ and $L_2$ of length $1$ each and then do induction using one of the lengths and then repeat with the other length. Is there any other way of proving this in a simpler way without making it too big with induction method?

by aste123 at August 24, 2016 12:02 PM


Bei dem Gedanken, dass irgendwelche Strände in Südfrankreich ...

Bei dem Gedanken, dass irgendwelche Strände in Südfrankreich Jagd auf Burkaträgerinnen machen und Bußgelder verhängen, da kommen dem einen oder anderen (unter anderem mir) so gewisse Nazi-Vergleiche in den Kopf. So marodierende Glatzenhorden, die "Kanacken klatschen" gehen wollen. Wie sieht denn das in der Praxis aus?

Wie das in der Praxis abläuft, kann man hier sehen.


The French ban on the burkini is threatening to turn into a farce as police officers armed with pepper spray and batons marched onto a beach today and ordered a woman to strip off.

Four burly cops stood over the middle-aged woman, who had been quietly sunbathing on the Promenade des Anglais beach in Nice - yards from the scene of the Bastille Day lorry attack - and watched her take off a Muslim-style garment which protected her modesty.

Nun ist Vorsicht geboten, weil die Daily Mail a) ein finsteres Wurstblatt ist, b) konservativ britisch die Franzosen als Feindbild hat, aber auf der anderen Seite c) selbst gerne gegen "die Terroristen" hetzt. Die Bilder sprechen aber für sich, finde ich.

Und als Bonus ist das Bild der Nonnen am Strand aufgetaucht, das den italienischen Imam neulich seinen Facebook-Account gekostet hat.

Update: Und falls jemand die Burka-Debatte an sich nicht versteht: Hier ist Hilfestellung.

Update: Leserbrief dazu:

Ich bin gerade in Nizza. Der Strand, wo das passiert ist, dürfte 10 Minuten zu Fuß von mir weg sein. Die Bekleidung, die auf dem Foto zu sehen ist, sieht man hin und wieder. Die letzten Tage ist mir aber nie aufgefallen, dass jemand komisch geschaut oder die Polizei gekommen ist.

Gestern waren wir in einem Vorort von Nizza am Strand, da hat es die Polizei und anderen Gäste nicht interessiert, dass eine Mutter diese Kleidung trug. Die Polizei hat lediglich einige Raucher auf das Rauchverbot am dortigen Strand hingewiesen, allerdings ohne Strafzettel und die Zigaretten wurden sofort wieder angezündet als die Streife ausser Sicht war. Mal zur Relation.

Update: Hier auch noch ein schöner Text zur Burkini-Debatte. Money Quote:

“Over 40 percent of our sales are from non-Muslim women,” she says. “The Jewish community embraces it. I’ve seen Mormons wearing it. A Buddhist nun purchased it for all of her friends. I’ve seen women who have issues with skin cancer or body image, moms, women who are not comfortable exposing their skin — they’re all wearing it.
Ich verstehe ja nur ein Detail nicht daran. Wieso sich mit einem Burkini in die Sonne legen? Macht man das nicht, damit die Sonne an die Haut kommt? Und das verhindert der Burkini? Also nicht dass ich jetzt irgendjemandem vorschreiben wollen würde, ob und wie er sich an den Strand legt, aber das Detail verstehe ich nicht.

August 24, 2016 12:01 PM


Finding a value in a sorted array in log R time, R is the number of distinct elements

The standard binary search algorithm gives log N time, where N is the total number of elements in the array. When the array has duplicates, I don't see how you could detect those duplicates ahead of time. (Iterating through the array takes N time, which is too much.) Consequently how do you improve the performance from log N to log R?

by Anonymous at August 24, 2016 11:56 AM

If you have a Cook reduction in both directions, do you also have a Karp reduction?

If there exists a polynomial reduction of a decision problem $\mathcal{P}_1$ into another decision problem $\mathcal{P}_2$ and also a polynomial reduction of $\mathcal{P}_2$ into $\mathcal{P}_1$, then is there also a polynomial transformation between $\mathcal{P}_1$ and $\mathcal{P}_2$?

These are the definitions I use:

Cook reduction
$\mathcal{P}_1$ polynomially reduces to $\mathcal{P}_2$ if there is a polynomial-time oracle algorithm for $\mathcal{P}_1$ using an oracle for $\mathcal{P}_2$.

Karp reduction
$\mathcal{P}_1=(X_1,Y_1)$ polynomially transforms to $\mathcal{P}_2=(X_2,Y_2)$ if there is a function $f:X_1\rightarrow X_2$ computable in polynomial time such that for all $x\in X_1$, $x\in Y_1$ if and only if $f(x)\in X_2$.

by Héctor Andrade at August 24, 2016 11:52 AM

Planet Theory

Proceedings of ICALP 2016

The proceedings of ICALP 2016 are now available from the LIPIcs web site. Many thanks to all the colleagues who have worked so hard to make this possible.

I hope that many of you will read the papers in the proceedings, which were selected by Davide, Michael, Yuval and  their PCs, and build on their research contributions.

by Luca Aceto ( at August 24, 2016 11:45 AM


About how to balance imbalanced data

When I read Decision Tree in Scikit learn, I find:

Balance your dataset before training to prevent the tree from being biased toward the classes that are dominant. Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value.

In the link:

I am confused.


Class balancing can be done by sampling an equal number of samples from each class

If I do like this, should I use add a proper sample weight for each samples in each class( or add class sample...).

For example, if I have two classes: A and B with number of samples

A:100 B:10000

Can I input 10000 samples for each and set weight:

input samples of A:10000, input samples of B:10000

weight of A:0.01 , weight of B: 1.0


But it still said:

preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value

I totally confused by it. Does it means I should input 100 samples of A and 10000 samples of B then set weight:

input samples of A:100, input samples of B:10000

weight of A:1.0 , weight of B: 1.0

But it seems I did nothing to balance the imbalanced data.

Which way is better and what's the meaning of second way in Scikit learn? Can anyone help me clarify it?

by insomnia at August 24, 2016 11:13 AM



Training Hidden Markov Model in R

Is it possible to train Hidden Markov Model in R? I have a set of observations with its corresponding labels. And I need to train HMM in order to get the Markov parameters (i.e. the transition probabilities matrix, emission probabilities matrix and initial distribution). So, I can predict for the future observations.

In other words, I need the opposite of Forward_Backward Algorithm..

by asker at August 24, 2016 11:08 AM

Fred Wilson

Trapped In A System

A book that has really stayed with me since I read it is The Prize, the story of the attempt to reform the Newark public school system.

And there is a particular scene in that book that really sums it up for me.

The author is at an anti-charter school protest and meets a woman who had spent that morning trying to get her son into a new charter school that had opened in Newark. The author asks the woman how it is possible that on the same day she would spend the morning trying to get her son into a charter school and the afternoon at an anti-charter protest.

The woman explains that most of her family are employed in good paying union jobs in the district schools and that the growth of charters is a threat to those jobs.

As I read that story I was struck by how rational the woman was acting. She was helping to preserve a system that provided an economic foundation for her family and at the same time opting her son out of it. 

In some ways that story is a microcosm of what is happening in the economy right now. Many people in the US (and around the world) are employed by (and trapped in) a system that no longer works very well. And although they realize the system is broken, they fight to support it because it underpins their economic security.

My partner Albert argues for a universal basic income to replace the old and broken system so we as a society can free ourselves from outdated approaches that don’t work anymore and move to adopt new and better systems. 

I think it is worth a shot to be honest.

by Fred Wilson at August 24, 2016 11:04 AM


What is a Short Option Hedging Portfolio?

In his book 'Stochastic Calculus for Finance II' Shreve uses the term: 'Short Option Hedging Portfolio' on page.156 (4.5.3). Can someone please explain this term with some kind of an example? It is preventing me from understanding why Portfolio Value evolution is equated with Option Value Evolution to derive the Differential Equation of Black-Scholes-Merton.


by Sagar Kolte at August 24, 2016 11:02 AM


Tensorflow multi-variable logistic regression not working

I am trying to create a program which will classify a point as either 1 or 0 using Tensorflow. I am trying to create an oval shape around the center of this plot, where the blue dots are:

Everything in the oval should be classified as 1, every thing else should be 0. In the graph above, the blue dots are 1s and the red x's are 0s.

However, every time I try to classify a point, it always choses 1, even if it was a point I trained it with, saying it was 0.

My question is simple: Why is the guess always 1, and what am I doing wrong or should do differently to fix this problem? This is my first machine learning problem I have tried without a tutorial, so I really don't know much about this stuff.

I'd appreciate any help you can give, thanks!

Here's my code:

#!/usr/bin/env python3

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt

training_in = numpy.array([[0, 0], [1, 1], [2, 0], [-2, 0], [-1, -1], [-1, 1], [-1.5, 1],   [3, 3], [3, 0], [-3, 0], [0, -3], [-1, 3], [1, -2], [-2, -1.5]])
training_out = numpy.array([1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0])

def transform_data(x):
    return [x[0], x[1], x[0]**2, x[1]**2, x[0]*x[1]]

new_training_in = numpy.apply_along_axis(transform_data, 1, training_in)

feature_count = new_training_in.shape[1]

x = tf.placeholder(tf.float32, [None, feature_count])
y = tf.placeholder(tf.float32, [None, 1])

W = tf.Variable(tf.zeros([feature_count, 1]))
b = tf.Variable(tf.zeros([1]))

guess = tf.nn.softmax(tf.matmul(x, W) + b)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y))

opti = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

init = tf.initialize_all_variables()
sess = tf.Session()

for i in range(1000):
    for (item_x, item_y) in zip(new_training_in, training_out):, feed_dict={ x: [item_x], y: [[item_y]]})


plt.plot(training_in[:6, 0], training_in[:6, 1], 'bo')
plt.plot(training_in[6:, 0], training_in[6:, 1], 'rx')

results =, feed_dict={ x: new_training_in })

for i in range(training_in.shape[0]):
    xx = [training_in[i:,0]]
    yy = [training_in[i:,1]]
    res = results[i]

    # this always prints `[ 1.]`

    # uncomment these lines to see the guesses
    # if res[0] == 0:
    #     plt.plot(xx, yy, 'c+')
    # else:
    #     plt.plot(xx, yy, 'g+')

by Addison at August 24, 2016 11:00 AM


Optimal Portfolio Asset Weights

I have calculated an optimal portfolio, using a historical covariance matrix, and determined the weights of n risky assets in the optimal portfolio.

The utility function is represented by U=E(R)-0.5*A*(variance).

I am wondering what makes certain assets receive high weights, and what makes certain assets receive low weights?

by Alex at August 24, 2016 10:58 AM


Best range of parameters in grid search?

I would like to run a naive implementation of grid search with MLlib but I am a bit confused about choosing the 'best' range of parameters. Apparently, I do not want to waste too much resources for a combination of parameters that will probably not give an improved model. Any suggestions from your experience?

set parameter ranges:

val intercept   : List[Boolean]  = List(false)
val classes     : List[Int]      = List(2)
val validate    : List[Boolean]  = List(true)
val tolerance   : List[Double]   = List(0.0000001 , 0.000001 , 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1.0)
val gradient    : List[Gradient] = List(new LogisticGradient() , new LeastSquaresGradient() , new HingeGradient())
val corrections : List[Int]      = List(5 , 10 , 15)
val iters       : List[Int]      = List(1 , 10 , 100 , 1000 , 10000)
val regparam    : List[Double]   = List(0.0 , 0.0001 , 0.001 , 0.01 , 0.1 , 1.0 , 10.0 , 100.0)
val updater     : List[Updater]  = List(new SimpleUpdater() , new L1Updater() , new SquaredL2Updater())

perform grid search:

val combinations = for (a <- intercept;
                        b <- classes;
                        c <- validate;
                        d <- tolerance;
                        e <- gradient;
                        f <- corrections;
                        g <- iters;
                        h <- regparam;
                        i <- updater) yield (a,b,c,d,e,f,g,h,i)

for( ( interceptS , classesS , validateS , toleranceS , gradientS , correctionsS , itersS , regParamS , updaterS ) <- combinations.take(3) ) {

      val lr : LogisticRegressionWithLBFGS = new LogisticRegressionWithLBFGS().



by user706838 at August 24, 2016 10:56 AM


Open source equity/bond index data

I have been using the tseries package of R (get.hist.quote) to get historical quotes for various indices from yahoo finance. I am interested in DAX, VDAX, EB.REXX and DJ UBS Commodity Index. When I tried to expand the time window for my analyses I saw that all time series except DAX and VDAX are discontinued.

My questions:

1) Do you know why EB.REXX (the symbol was REX.DE) dissapeared on yahoo finance (I now use EB.REXX 10 years, REX0.DE, but it is also discontinued) and why I can not find DJ UBS Cdty Index (symbol: ^DJUBS) anymore?

I use code like


get.hist.quote(instrument="REX0.DE",start="2006-01-01",quote=c("AdjClose"),compression="d") get.hist.quote(instrument="^DJUBS",start="2006-01-01",quote=c("AdjClose"),compression="d")

but both times series end in the 2nd half of 2012.

2) Do you know any R-compatible open data source where I can get

  1. a price or performance index for German or core-EURO government bonds (like eb.rexx)
  2. a price or performance index for broad commodities (like DJ UBS Cdty Index)?

EDIT: I started to try getSymbols of the quantmode package.

  1. In google finance I found INDEXDB:RXPG for EB.REXX and INDEXDJX:DJUBS for DJ UBS - are these the correct indices? Where do I find any description of the data?
  2. The example taken from the manual - getSymbols("MSFT",src="google") - works, but what I would need for the index data - getSymbols("INDEXDB:RXPG",src="google") - does not ...

by Richard at August 24, 2016 10:49 AM


How to add another feature (length of text) to current bag of words classification? Scikit-learn

I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word.

Here is my sample code.

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "the capital of great britain is london. london is a huge metropolis which has a great many number of people living in it. london is also a very old town with a rich and vibrant cultural history.",
                    "london is in the uk. they speak english there. london is a sprawling big city where it's super easy to get lost and i've got lost many times.",
                    "london is in england, which is a part of great britain. some cool things to check out in london are the museum and buckingham palace.",
                    "london is in great britain. it rains a lot in britain and london's fogs are a constant theme in books based in london, such as sherlock holmes. the weather is really bad there.",])
y_train = [[0],[0],[0],[0],[1],[1],[1],[1]]

X_test = np.array(['it's a nice day in nyc',
                   'i loved the time i spent in london, the weather was great, though there was a nip in the air and i had to wear a jacket.'
target_names = ['Class 1', 'Class 2']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_df=1,max_df=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))]), y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

Now it is clear that the text about London tends to be much longer than the text about New York. How would I add length of the text as a feature? Do I have to use another way of classification and then combine the two predictions? Is there any way of doing it along with the bag of words? Some sample code would be great -- I'm very new to machine learning and scikit learn.

by aaravam at August 24, 2016 10:42 AM


How much to invest to reach a target?

Your current wealth is $W$. Each day you can invest some of it; there's a probability $p$ that you will win as much as you invested, $1-p$ that you will lose it. You want to reach a target wealth $W_T$ within $n$ days. Each day, you can choose the fraction $f$ of your wealth to invest. How do you choose $f$ to maximise the chance to hit your target in time?

If it helps, assume $p > 0.5$, $n \gg 1$.

This is essentially a pure maths problem but I thought it would be interesting for quants. I have seen discussions of similar problems (e.g. "Can you do better than Kelly in the short run?", Browne (2000)), but they assume a continuous outcome and a few other things. I'd also be happy with a way to find $f$ via simulations, an analytical formula is not essential.

[Edit: you cannot bet more than you currently have. I should have specified this earlier.]

by Andrea at August 24, 2016 10:40 AM


Difference between standardscaler and Normalizer in sklearn.preprocessing

What is the difference between standardscaler and normalizer in sklearn.preprocessing module? Don't both do the same thing? i.e remove mean and scale using deviation?

by rb1992 at August 24, 2016 10:36 AM

Should a neural network be able to have a perfect train accuracy?

The title says it all: Should a neural network be able to have a perfect train accuracy? Mine saturates at ~0.9 accuracy and I am wondering if that indicates a problem with my network or the training data.

Training instances: ~4500 sequences with an average length of 10 elements. Network: Bi-directional vanilla RNN with a softmax layer on top.

by Alex at August 24, 2016 10:29 AM

A function that can both compose and chain (dot notation) in Javascript

I'm trying to convert an old api that uses a lot of dot notation chaining which needs to be kept ie:

[1,2,3,4].newSlice(1,2).add(1) // [3]

I'd like to add the functional style of composition in this example Ramda but lodash or others would be fine:

const sliceAddOne = R.compose(add(1), newSlice(1,2)) 
sliceAddOne([1,2,3,4])// [3]

My question is how can I do both chaining and composition in my function newSlice what would this function look like?

I have a little jsBin example.

by cmdv at August 24, 2016 10:22 AM


Conditional Expected Shortfall

I will pls like to know how to forecast the conditional mean

I have fitted an AR(1)-Garch(1,1) to my data and want to estimate conditional expected shotrfall

by Kebba Bah at August 24, 2016 10:22 AM


Wikileaks hat gerade massiv ins Klo gegriffen und personenbezogene ...

Wikileaks hat gerade massiv ins Klo gegriffen und personenbezogene Daten veröffentlicht. Damit haben sie erstmals tatsächlich Menschen in Gefahr gebracht, was das Pentagon ihnen ja seit Jahren vorwirft, ohne dafür tatsächlich auch nur einen einzigen Fall belegen zu können.

Wie sagen die Piraten bei Asterix? Sic transit gloria mundi. Seufz.

August 24, 2016 10:01 AM


Non-contractual accounts behavioural study

I need to carry a non-contractual accounts behavoiural study for a bank. The objective is to estimate core/non core ratios and then bucket and ftp them. Any recipe where to start? I have 3yrs of historical data, daily closing balances. From what I googled I understand that I need some kind of seasonal vs growth trend segregation. But only guidelines, nothing in particular. Visually represented my data has (e.g. current accounts) very heavy seasonal bias with highs in shoulder seasons and lows in the festive seasons ;)). How to isolate it? How do I then calculate the true core/volatile ratio?

by Peaches at August 24, 2016 09:49 AM



I am getting error when i run an azure ML experiment in Excel

Error! {"error":{"code":"LibraryExecutionError","message":"Module execution encountered an internal library error.","details":[{"code":"TableSchemaColumnCountMismatch","target":" (AFx Library)","message":"data: The table column count (0) must match the schema column count (17)."}]}}

enter image description here

Can you help me to solve this problem

Thanks, Smitha

by smitha nidagundi at August 24, 2016 09:26 AM

How to associate two groups of clusters in user-item matrix(a bit like collaborative filtering)?

I have constructed a user-item matrix by Python. This matrix is like this: user-item matrix example

In the schematic diagram of the matrix above, I use k-means to cluster rows and columns respectively, I determine the k by use x-means algorithm(use Bayesian Information Criterion), and naturally, I got two groups of clusters:

cluster based rows:

1.WangMing, BaiLi (indicates Chinese)

2.Alice, Bob(indicates American)

3.Sakura, Naruto(indicates Japanese)

cluster based columns:

1.noodles, dumplings(indicates Chinese food)

2.McDonald's, KFC, Burger King(indicates American fast food)

3.Sushi, salmon(indicates Japanese cuisine)

For some need, I want to associate two groups of clusters, for example, I want this output according the clusters result above:

WangMing, BaiLi(Chinese) -> noodles, dumplings(Chinese food)

Alice, Bob(American) -> McDonald's, KFC, Burger King(American fast food)

Sakura, Naruto(Japanese) -> Sushi, salmon(Japanese cuisine)

Because clustering is run independently, I don't know how to do this association. I have already constructed a user-item matrix and new to Machine learning, so could you please show me some code or Github project or some papers about how to handle this problem?

Thank you very much! I really need guide.

by tianjianbo at August 24, 2016 09:10 AM



Python: tf-idf-cosine: to find document similarity

I was following a tutorial which was available at Part 1 & Part 2 unfortunately author didn't have time for the final section which involves using cosine to actually find the similarity between two documents. I followed the examples in the article with the help of following link from stackoverflow I have included the code that is mentioned in the above link just to make answers life easy.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.corpus import stopwords
import numpy as np
import numpy.linalg as LA

train_set = ["The sky is blue.", "The sun is bright."] #Documents
test_set = ["The sun in the sky is bright."] #Query
stopWords = stopwords.words('english')

vectorizer = CountVectorizer(stop_words = stopWords)
#print vectorizer
transformer = TfidfTransformer()
#print transformer

trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
testVectorizerArray = vectorizer.transform(test_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
print 'Transform Vectorizer to test set', testVectorizerArray
print transformer.transform(trainVectorizerArray).toarray()
tfidf = transformer.transform(testVectorizerArray)
print tfidf.todense()

as a result of above code I have following matrix

Fit Vectorizer to train set [[1 0 1 0]
 [0 1 0 1]]
Transform Vectorizer to test set [[0 1 1 1]]

[[ 0.70710678  0.          0.70710678  0.        ]
 [ 0.          0.70710678  0.          0.70710678]]

[[ 0.          0.57735027  0.57735027  0.57735027]]

I am not sure how to use this output to calculate cosine similarity, I know how to implement cosine similarity respect to two vectors with similar length but here I am not sure how to identify the two vectors.

by Null-Hypothesis at August 24, 2016 08:17 AM



Converting code to pure functional form

In a sense, any imperative code can be converted to pure functional form by making every operation receive and pass on a 'state of the world' parameter.

However, suppose you have some code that is almost in pure functional form except that, buried many layers of function calls deep, are a handful of imperative operations that modify global or at least widely accessed state e.g. calling a random number generator, updating a counter, printing some debug info, and you want to convert it to pure functional form, algorithmically, with the minimum of changes.

Is there a way to do this without essentially turning the entire program inside out?

by rwallace at August 24, 2016 07:53 AM



Functional JavaScript, remove assignment

How can I avoid the self variable here?

function urlBuilder(endpoint){
    var status = ["user_timeline", "home_timeline", "retweets_of_me", "show", "retweets", 
            "update", "update_with_media", "retweet", "unretweet", "retweeters", "lookup"
        friendships = ["incoming", "outgoing", "create", "destroy", "update", "show"]; 
    let endpoints = {
        status: status,
        friendships: friendships

    var self = { };

    endpoints[endpoint].forEach(e => {
        self[e] = endpoint + "/" + e;

    return self;


somewhat better, still an assignment statement.

return [{}].map(el => {
  endpoints[endpoint].forEach(e => {
    el[e] = endpoint + "/" + e;
  return el;

by Mega Man at August 24, 2016 07:15 AM


Algorithm to optimize the auctioning profit [on hold]

I am trying to come up with an optimal algorithm for an auctioning system where people who want to buy a set of items can collectively place bid for them.

Input: N items along with M bids from people and each bid is like [price ; list of items]

Output: Maximize the total profits of the auction holder from the auction

By, profit I mean the sum of the accepted bid values. The condition for any two accepted bids is that their list of items should not have any item common.

I thought of some greedy solutions but they didn't work very well.

Which algorithms can be used here? Are there any algorithms which can fairly optimize the profits if not maximize (like if there's a maximum time for the algorithm to run)?

By algorithm that fairly optimizes the profits I mean like, the hill climbing algorithm where this problem could be modeled as a greedy local search which might not give the maximum profit but still would give us a fairly decent profit (at a local maxima) in a lesser amount of time.

EDIT This is an example to demonstrate the problem

Input : N = 6 and following are the bids from 4 people -

Person 1 : 3000 for 0, 1, 4

Person 2 : 2000 for 0, 1, 5

Person 3 : 1000.5 for 2, 3

Person 4 : 1525.75 for 0, 1, 2, 3, 4, 5

Then picking bids from person 1 and 3 would give maximum profit (4000.5) to the auction holder. (When the bids become too high then even an approximate algorithm which fairly optimizes if not maximizes this profit would do).

by ralphsol at August 24, 2016 07:05 AM



How to machine learn (Tree) on several attributes?

I am using python and scikit-learn's tree classifier in a little fictive machine learning problem. I have binary outcome variable (wc_measure) and I believe it is dependent on a few other variables (cash, crisis, and industry). I tried the following:

#   import neccessary packages
import pandas as pd
import numpy as np
import sklearn as skl
from sklearn import tree
from sklearn.cross_validation import train_test_split as tts

#   import data and give a little overview
sample = pd.read_stata('sample_data.dta')

s = sample

#   What I want to learn on
X = [s.crisis,, s.industry]
y = s.wc_measure
X_train, X_test, y_train, y_test = tts(X, y, test_size = .5)

#let's learn a little

my_tree = tree.DecisionTreeClassifier()
clf =, y_train)
predictions = my_tree.predict(X_test)

I get following error: Number of labels=50 does not match number of samples=1. If I base Xon a single variable (e. g. X = s.crisis) I am asked to reshape X. I don't fully understand why I have either of these issues... Ideas?

PS: This is the return of print(X)

[0     4.0
1     4.0
2     5.0
3     3.0
4     4.0
5     2.0
6     2.0
7     1.0
8     3.0
9     3.0
10    4.0
11    3.0
12    2.0
13    4.0
14    5.0
15    4.0
16    2.0
17    2.0
18    3.0
19    2.0
20    5.0
21    4.0
22    2.0
23    4.0
24    5.0
25    1.0
26    5.0
27    3.0
28    4.0
29    2.0
70    1.0
71    4.0
72    4.0
73    1.0
74    4.0
75    3.0
76    4.0
77    2.0
78    2.0
79    5.0
80    2.0
81    3.0
82    5.0
83    4.0
84    4.0
85    5.0
86    3.0
87    3.0
88    4.0
89    2.0
90    2.0
91    3.0
92    3.0
93    4.0
94    3.0
95    1.0
96    4.0
97    2.0
98    3.0
99    4.0
Name: crisis, dtype: float32, 0      450.283417
1      113.472214
2       11.811784
3     1007.507446
4      293.895142
5     1133.297729
6     2237.830322
7     1475.787109
8      283.363678
9      626.888794
10      38.865730
11     991.999390
12    1115.746948
13     373.537231
14      97.570717
15     136.079193
16    2560.691406
17     667.062073
18    1378.384521
19     152.716400
20       5.779267
21     481.511566
22     677.809631
23     722.521790
24      32.927990
25    2504.450928
26      17.422865
27     651.585083
28     549.469177
29     297.458527
70    1198.370239
71     471.343933
72     389.709290
73    2962.622803
74     581.519287
75    1148.822388
76      67.653664
77    1346.391602
78    1764.086914
79      14.308219
80     973.152161
81     552.576904
82       2.863116
83     425.520752
84     321.773682
85      63.597332
86    1351.122559
87     735.856567
88     745.656677
89    2784.453125
90    1438.272705
91     768.780823
92     827.021423
93     591.778015
94     885.169434
95    1143.088867
96     399.816803
97    1517.454834
98    1311.692505
99     533.062561
Name: cash, dtype: float32, 0     5.0
1     2.0
2     3.0
3     5.0
4     4.0
5     3.0
6     5.0
7     1.0
8     1.0
9     2.0
10    1.0
11    5.0
12    2.0
13    4.0
14    6.0
15    2.0
16    6.0
17    2.0
18    5.0
19    1.0
20    3.0
21    4.0
22    2.0
23    6.0
24    4.0
25    4.0
26    3.0
27    3.0
28    5.0
29    1.0
70    2.0
71    4.0
72    3.0
73    6.0
74    6.0
75    5.0
76    1.0
77    3.0
78    5.0
79    4.0
80    2.0
81    3.0
82    2.0
83    5.0
84    3.0
85    5.0
86    5.0
87    4.0
88    6.0
89    6.0
90    4.0
91    3.0
92    4.0
93    6.0
94    3.0
95    2.0
96    3.0
97    4.0
98    6.0
99    4.0

PPS: Here is how I generate the data in Stata:

clear matrix
clear all
set more off

set obs 100
gen id = _n

    gen industry = round(runiform()*5+1)
    gen activity = round(runiform()*5+1)
    gen crisis = round(runiform()*4+1)
        egen min_crisis = min(crisis)
        egen max_crisis = max(crisis)
        gen n_crisis = (crisis - min_crisis)/(max_crisis-min_crisis)

*Company details
    gen staff = round((0.5 * industry + 0.3 * activity - 0.2 * crisis) * runiform()*100+1) 

    gen revenue = (0.5 * industry + 0.2 * activity - 0.3 * crisis ) * 1000 + runiform()
        replace revenue = 0 if revenue<0

    *Working Capital (wc)
    gen stock = runiform()*0.5*crisis*revenue
    gen receivables = runiform()*0.5*crisis*revenue
    gen payables = runiform()*-0.5*crisis*revenue
        replace payables = 0 if payables < 0
    gen wc = stock + receivables - payables 
        egen avg_wc = mean(wc), by(industry)

    gen loan = (0.5 * industry + 0.2 * activity - 0.3 * crisis ) * 1000 + runiform()
        replace loan = 0 if loan<0
        egen pc_loan = pctile(loan), p(0.2) by(industry)
        replace loan = 0 if loan<pc_loan

    gen current_debt = n_crisis * loan + runiform()*100

    gen cash = (1-n_crisis)*revenue + runiform()*100


    *WC-measure (binary)
        gen wc_status = (wc-avg_wc)
            egen max_wc_status = max(wc_status), by(industry)
            egen min_wc_status = min(wc_status), by(industry)
            gen n_wc_status = (wc_status - min_wc_status) / (max_wc_status-min_wc_status)
    gen wc_measure = round(n_wc_status)

by Rachel Sleeps at August 24, 2016 06:57 AM

train logistic regression model with different feature dimension in scikit learn

Using Python 2.7 on Windows. Want to fit a logistic regression model using feature T1 and T2 for a classification problem, and target is T3.

I show the values of T1 and T2, as well as my code. The question is, since T1 has dimension 5, and T2 has dimension 1, how should we pre-process them so that it could be leveraged by scikit-learn logistic regression training correctly?

BTW, I mean for training sample 1, its feature of T1 is [ 0 -1 -2 -3], and feature of T2 is [0], for training sample 2, its feature of T1 is [ 1 0 -1 -2] and feature of T2 is [1], ...

import numpy as np
from sklearn import linear_model, datasets

arc = lambda r,c: r-c
T1 = np.array([[arc(r,c) for c in xrange(4)] for r in xrange(5)])
print T1
print type(T1)
T2 = np.array([[arc(r,c) for c in xrange(1)] for r in xrange(5)])
print T2
print type(T2)
T3 = np.array([0,0,1,1,1])

logreg = linear_model.LogisticRegression(C=1e5)

# we create an instance of Neighbours Classifier and fit the data.
# using T1 and T2 as features, and T3 as target, T3)


[[ 0 -1 -2 -3]
 [ 1  0 -1 -2]
 [ 2  1  0 -1]
 [ 3  2  1  0]
 [ 4  3  2  1]]



by Lin Ma at August 24, 2016 06:27 AM


Why are there two different `until` ($\cup$) semantics in Timed Computation Tree Logic?

In the book of Principles of Model Checking (Christel Baier and Joost-Peter Katoen, MIT Press, 2007), Section 9.2, page 701, the semantics of the until modality is defined over some time-divergent path $\pi \in s_0 \Rightarrow^{d_0} s_1 \Rightarrow^{d_1} \cdots \Rightarrow^{d_{i-1}} s_i \Rightarrow^{d_i} \cdots$ as follows:

(We can skip the formal definition at your first reading.)
$\pi \models \Phi \cup^{J} \Psi \iff$
$\exists i \ge 0. s_i + d \models \Psi \textrm{ for some } d \in [0,d_i] \textrm{ with } \sum_{k=0}^{i-1}d_k + d \in J \textrm{ and }$ $\forall j \le i. s_j + d' \models \Phi \lor \Psi \textrm{ for any } d' \in [0,d_j] \textrm{ with } \sum_{k=0}^{j-1} d_k + d' \le \sum_{k=0}^{i-1} d_k + d$.

Intuitively, time-divergent path $\pi \in s_0 \Rightarrow^{d_0} s_1 \Rightarrow^{d_1} \cdots \Rightarrow^{d_{i-1}} s_i \Rightarrow^{d_i} \cdots$ satisfies $\Phi \cup^{J} \Psi$ whenever at some time point in $J$, a state is reached satisfying $\Psi$ and at all previous time instants $\Phi \lor \Psi$ holds.

However, in the book of Model Checking by E.M. Clarke (Section 16.3, Page 256), the semantics of the until modality is given as follows:

$s \models E[\Phi \cup_{[a,b]} \Psi]$ if and only if there exists a path $\pi = s_0 s_1 s_2 \cdots$ starting at $s = s_0$ and some $i$ such that $a \le i \le b$ and $s_i \models \Psi$ and for all $j < i, s_j \models \Phi$.

As indicated, the second definition is stricter than the first one in that it does not allow the case of $\lnot \Phi \land \Psi$ before reaching a state satisfying $\Psi$.


  1. Why are there two different until ($\cup$) semantics in Timed Computation Tree Logic (TCTL)?

  2. Which one is more official?

by hengxin at August 24, 2016 06:21 AM




Should I prefer joined() or flatMap(_:) in Swift 3?

Swift 3 recently added joined(). I'm curious about the performance characteristics of these two ways of flattening an array:

let array = [[1,2,3],[4,5,6],[7,8,9]]
let j = Array(array.joined())
let f = array.flatMap({$0})

They both flatten the nested array into [1, 2, 3, 4, 5, 6, 7, 8, 9]. Should I prefer one over the other for performance? Also, is there a more readable way to write the calls?

by Ben Morrow at August 24, 2016 05:38 AM


Can random suitless $52$ playing card data be compressed to approach, match, or even beat entropy encoding storage? If so, how?

I have real data I am using for a simulated card game. I am only interested in the ranks of the cards, not the suits. However it is a standard $52$ card deck so there are only $4$ of each rank possible in the deck. The deck is shuffled well for each hand, and then I output the entire deck to a file. So there are only $13$ possible symbols in the output file which are $2,3,4,5,6,7,8,9,T,J,Q,K,A$. ($T$ = ten rank). So of course we can bitpack these using $4$ bits per symbol, but then we are wasting $3$ of the $16$ possible encodings. We can do better if we group $4$ symbols at a time, and then compress them, because $13^4$ = $28,561$ and that can fit rather "snugly" into $15$ bits instead of $16$. The theoretical bitpacking limit is log($13$) / log($2$) = $3.70044$ for data with $13$ random symbols for each possible card. However we cannot have $52$ kings for example in this deck. We MUST have only $4$ of each rank in each deck so the entropy encoding drops by about half a bit per symbol to about $3.2$.

Ok, so here is what I am thinking. This data is not totally random. We know there are $4$ of each rank so in each block of $52$ cards (call it a shuffled deck), so we can make several assumptions and optimizations. One of those being we do not have to encode the very last card, because we will know what it should be. Another savings would be if we end on a single rank; for example, if the last $3$ cards in the deck are $777$, we wouldn't have to encode those because the decoder would be counting cards up to that point and see that all the other ranks have been filled, and will assume the $3$ "missing" cards are all $7$s.

So my question to this site is, what other optimizations are possible to get an even smaller output file on this type of data, and if we use them, can we ever beat the theoretical (simple) bitpacking entropy of $3.70044$ bits per symbol, or even approach the ultimate entropy limit of about $3.2$ bits per symbol on average? If so, how?

When I use a ZIP type program (WinZip for example), I only see about a $2:1$ compression, which tells me it is just doing a "lazy" bitpack to $4$ bits. If I "pre-compress" the data using my own bitpacking, it seems to like that better, because then when I run that through a zip program, I am getting a little over $2:1$ compression. What I am thinking is, why not do all the compression myself (because I have more knowledge of the data than the Zip program does). I am wondering if I can beat the entropy "limit" of log($13$)/log($2$) = $3.70044$. I suspect I can with the few "tricks" I mentioned and a few more I can probably find out. The output file of course does not have to be "human readable". As long as the encoding is lossless it is valid.

Here is a link to $3$ million human readable shuffled decks ($1$ per line). Anyone can "practice" on a small subset of these lines and then let it rip on the entire file. I will keep updating my best (smallest) filesize based on this data.

By the way, in case you are interested in what type of card game this data is used for, here is the link to my active question (with $300$ point bounty). I am being told it is a hard problem to solve (exactly) since it would require a huge amount of data storage space. Several simulations agree with the approximate probabilities though. No purely mathematical solutions have been provided (yet). It's too hard, I guess.

I have a good algorithm that is showing $168$ bits to encode the first deck in my sample data. This data was generated randomly using the Fisher-Yates shuffle algorithm. It is real random data, so my newly created algorithm seems to be working VERY well, which makes me happy.

Regarding the compression "challenge", I can give you more information. I have my 3 million deck sorted in a database so I can run queries against it. I immediate noticed that the first 4 cards are the same for a few dozen decks each so many decks (for example) start with 2222 after sorting. That means this information is redundant and can be stored once and then the remainder of the changing parts of the decks stored with it. There should be a significant savings of bits per deck using this scheme since the normal overhead of the first 4 cards is 3.7 bits per card which is 14.8 bits total. We would have to give 1 bit back to tell the decoder whether we are encoding an "abbreviated" deck or a full deck so I am expecting maybe 13 bits or so saved per deck on average, thus putting it BELOW the 166 bit per deck "limit". However I do not have all 3 million bitpatterns from the output of my packing scheme yet, so I don't yet know if the redundancy will be similar to that of the raw data. I suspect it should be similar but not identical.

by David James at August 24, 2016 05:31 AM

Dynamic Programming

I am learning algorithms, but I go stuck at the Dynamic Programming . Theoretically, I am getting the idea. But unable to implement it. Finding difficulty in identifying the recursion for problems. Someone Please help me the way to approach the dynamic programming problems.

by Kaushal at August 24, 2016 05:20 AM


AUC calculation in decision tree in scikit-learn

Using scikit-learn with Python 2.7 on Windows, what is wrong with my code to calculate AUC? Thanks.

from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
#print cross_val_score(clf,,, cv=10, scoring="precision")
#print cross_val_score(clf,,, cv=10, scoring="recall")
print cross_val_score(clf,,, cv=10, scoring="roc_auc")

Traceback (most recent call last):
  File "C:/Users/foo/PycharmProjects/CodeExercise/", line 8, in <module>
    print cross_val_score(clf,,, cv=10, scoring="roc_auc")
  File "C:\Python27\lib\site-packages\sklearn\", line 1433, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\", line 800, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\", line 658, in dispatch_one_batch
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\", line 566, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\", line 180, in __init__
    self.results = batch()
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python27\lib\site-packages\sklearn\", line 1550, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "C:\Python27\lib\site-packages\sklearn\", line 1606, in _score
    score = scorer(estimator, X_test, y_test)
  File "C:\Python27\lib\site-packages\sklearn\metrics\", line 159, in __call__
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

by Lin Ma at August 24, 2016 05:06 AM


Reminder: Early registration for EuroBSDcon 2016 ends Aug 24

EuroBSDcon 2016 (see earlier article) is on from 22 to 25 September 2016, in Belgrade, Serbia.

Early registration ends 2016-08-24 23:59 CEST, so get in now for discounted prices on great (Open)BSD talks and tutorials!

August 24, 2016 04:46 AM


scikit learn logistic regression precision calculation weird warning

Using scikit-learn with Python 2.7 on Windows. Here is my code and my code has no warning if I change precision to precision_weighted for scoring parameter. But I do not know what does the warning mean and what is the reason behind scene to explicitly specify average to be one of (None, 'micro', 'macro', 'weighted', 'samples')? Actually in my case, I want to treat all samples of equal weight, but it seems there is no such option in the 5 choices?

from sklearn import linear_model, datasets
from sklearn.cross_validation import cross_val_score

# import some data to play with
iris = datasets.load_iris()
X =[:, :2]  # we only take the first two features.
Y =

h = .02  # step size in the mesh

logreg = linear_model.LogisticRegression(C=1e5)

# we create an instance of Neighbours Classifier and fit the data., Y)

print cross_val_score(logreg, X, Y, cv=10, scoring="precision")
#print cross_val_score(logreg, X, Y, cv=10, scoring="precision_weighted")

Warning message,

C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
C:\Python27\lib\site-packages\sklearn\metrics\ DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
[ 0.66666667  0.80555556  0.9047619   0.86666667  0.80555556  0.875
  0.94444444  0.80555556  0.82222222  0.80555556]

by Lin Ma at August 24, 2016 04:43 AM



Are Gaussian clusters linearly separable?

Imagine you have two Gaussian probability distributions in two-dimensions The first is centered at (0,1) and the second at (0,-1). (For simplicity, assume they have the same variance.) Can one consider that the clusters of data points sampled from these two Gaussians are linearly separable?

Intuitively, it's clear that the boundary separating the two distributions is linear, namely the abscissa in our case. However, the formal requirement for linear separability is that the convex hulls of the clusters do not overlap. This cannot be the case with Gaussian-generated clusters since their underlying probability distributions pervade all of R^2 (albeit with negligible probabilities far away from the mean).

So, are Gaussian-generated clusters linearly separable? How can one reconcile the requirement of convex hulls with the fact that a straight line is the only conceivable "boundary"? Or, perhaps, the boundary effectively ceases to be linear once non-equal variances come in the pictures?

by Tfovid at August 24, 2016 03:55 AM


periodicity extraction of cubic lattice substitutional system

I would like to ask what is the state of art methods in extracting periodicities from lattice substitutional system

Given a 12 by 12 by 12 cubic periodic systems, where each grid has a number, say 1, 0... The question to ask is what is the method that could be used to extract an "approximate subsystem" that has much lower periodicity.

by user40780 at August 24, 2016 03:37 AM

How to Determine Existence of Turing Reducible Languages?

Are finite and recursive instances of $L$ possible with the following constraints?

$\text{L $\subseteq$ {0,1}*}$ and $L \leq HPL \leq \overline L$ where $\overline L = \{\text{ x $\in$ {0,1}* : x $\notin$ L }\}$. $HPL$ denotes the halting problem language.

  1. Does there exist a finite language $L$ that meets the above constraints? In other words, given the above information, is it possible that $L$ is finite? I'm thinking $L$ being a finite language is recursive and thus can be reduced to $HPL$ a recursive enumerable language.

  2. Does there exist a recursive language $L$ that meets the above constraints? In other words, given the above information, is it possible that $L$ is recursive?

by Brandeis King at August 24, 2016 03:30 AM



Approximating set cover when it is known that an exact set cover exists

Suppose $U = \{1, 2, \cdots, n\}$ is a universe and $\mathcal S = \{S_1, S_2, \cdots, S_m\}$ is a collection of sets such that each set contains exactly $c$ elements, where $c$ is a constant.

In this case, a $c$-approximation is easy. It is also possible to improve that to $\ln c + 1$-approximation.

My question is following:
Suppose along with this special set cover instance, you are told that an exact cover (of size $n/c$) exists. Is it possible to get a better approximation factor? What is known about hardness of approximation in this case?

by taninamdar at August 24, 2016 02:14 AM


How do I read the output of sparsenn on my test set?

After running sparsenn on my training set, I get this output:

    pass 0 tacc 0.54629 sacc 0.54629 trms 0.99939 srms 0.99939 tauc 0.50530 sauc 0.50530 ( [ { 
    pass 10 tacc 0.54629 sacc 0.54629 trms 0.99600 srms 0.99600 tauc 0.50550 sauc 0.50550 ) [ { 
    pass 20 tacc 0.54629 sacc 0.54629 trms 0.99569 srms 0.99569 tauc 0.50286 sauc 0.50286 ) [ } 
    pass 30 tacc 0.54629 sacc 0.54629 trms 0.99572 srms 0.99572 tauc 0.50526 sauc 0.50526 ) ] } 
    pass 40 tacc 0.54629 sacc 0.54629 trms 0.99573 srms 0.99573 tauc 0.50539 sauc 0.50539 ) ] }     

which is alright. Now after I run this on my test set, I get an output with a number on each line - here's a sample:


I have 30000 lines in my output. What do these numbers represent? How do I understand the AUC from these? I would ideally like to generate the AUC curve.

 ./nnlearn -e 170 -h 32 -r 0.1 nnoutput90.txt nnoutput90.txt qwerty
 ./nnclassify nntest.txt qwerty.auc p.txt

These are the code I'm running.

by tvishwa107 at August 24, 2016 01:58 AM



How to automatically get all options data for a particular stock into microsoft excel?

I'm looking for a way to get the entire options chain (All options expiries) for a particular stock in excel without manually copy pasting anything. It does not have to be real time and I will only be searching for 1 stocks options at a time. A free method would be ideal.

by RageAgainstheMachine at August 24, 2016 01:44 AM

arXiv Networking and Internet Architecture

Mobile and Residential INEA Wi-Fi Hotspot Network. (arXiv:1608.06606v1 [cs.NI])

Since 2012 INEA has been developing and expanding the network of IEEE 802.11 compliant Wi-Fi hotspots (access points) located across the Greater Poland region. This network consists of 330 mobile (vehicular) access points carried by public buses and trams and over 20,000 fixed residential hotspots distributed throughout the homes of INEA customers to provide Internet access via the "community Wi-Fi" service. Therefore, this paper is aimed at sharing the insights gathered by INEA throughout 4 years of experience in providing hotspot-based Internet access. The emphasis is put on daily and hourly trends in order to evaluate user experience, to determine key patterns, and to investigate the influences such as public transportation trends, user location and mobility, as well as, radio frequency noise and interference.

by <a href="">Bartosz Musznicki</a>, <a href="">Karol Kowalik</a>, <a href="">Piotr Ko&#x142;odziejski</a>, <a href="">Eugeniusz Grzybek</a> at August 24, 2016 01:30 AM

Application of Public Ledgers to Revocation in Distributed Access Control. (arXiv:1608.06592v1 [cs.CR])

There has recently been a flood of interest in potential new applications of blockchains, as well as proposals for more generic designs called public ledgers. Most of the novel proposals have been in the financial sector. However, the public ledger is an abstraction that solves several of the fundamental problems in the design of secure distributed systems: global time in the form of a strict linear order of past events, globally consistent and immutable view of the history, and enforcement of some application-specific safety properties. This paper investigates the applications of public ledgers to access control and, more specifically, to group management in distributed systems where entities are represented by their public keys and authorization is encoded into signed certificates. It is particularly difficult to handle negative information, such as revocation of certificates or group membership, in the distributed setting. The linear order of events and global consistency simplify these problems, but the enforcement of internal constraints in the ledger implementation often presents problems. We show that different types of revocation require slightly different properties from the ledger. We compare the requirements with Bitcoin, the best known blockchain, and describe an efficient ledger design for membership revocation that combines ideas from blockchains and from web-PKI monitoring. While we use certificate-based group-membership management as the case study, the same ideas can be applied more widely to rights revocation in distributed systems.

by <a href="">Thanh Bui</a>, <a href="">Tuomas Aura</a> at August 24, 2016 01:30 AM

Syntax and analytic semantics of LISA. (arXiv:1608.06583v1 [cs.PL])

We provide the syntax and semantics of the LISA (for "Litmus Instruction Set Architecture") language. The parallel assembly language LISA is implemented in the herd7 tool (this http URL) for simulating weak consistency models.

by <a href="">Jade ALglave</a>, <a href="">Patrick Cousot</a> at August 24, 2016 01:30 AM

Boosting PLC Networks for High-Speed Ubiquitous Connectivity in Enterprises. (arXiv:1608.06574v1 [cs.NI])

Powerline communication (PLC) provides inexpensive, secure and high speed network connectivity, by leveraging the existing power distribution networks inside the buildings. While PLC technology has the potential to improve connectivity and is considered a key enabler for sensing, control, and automation applications in enterprises, it has been mainly deployed for improving connectivity in homes. Deploying PLCs in enterprises is more challenging since the power distribution network is more complex as compared to homes. Moreover, existing PLC technologies such as HomePlug AV have not been designed for and evaluated in enterprise deployments. In this paper, we first present a comprehensive measurement study of PLC performance in enterprise settings, by analyzing PLC channel characteristics across space, time, and spectral dimensions, using commodity HomePlug AV PLC devices. Our results uncover the impact of distribution lines, circuit breakers, AC phases and electrical interference on PLC performance. Based on our findings, we show that careful planning of PLC network topology, routing and spectrum sharing can significantly boost performance of enterprise PLC networks. Our experimental results show that multi-hop routing can increase throughput performance by 5x in scenarios where direct PLC links perform poorly. Moreover, our trace driven simulations for multiple deployments, show that our proposed fine-grained spectrum sharing design can boost the aggregated and per-link PLC throughput by more than 20% and 100% respectively, in enterprise PLC networks.

by <a href="">Kamran Ali</a>, <a href="">Ioannis Pefkianakis</a>, <a href="">Alex X. Liu</a>, <a href="">Kyu-Han Kim</a> at August 24, 2016 01:30 AM

High-Quality Synthesis Against Stochastic Environments. (arXiv:1608.06567v1 [cs.LO])

In the classical synthesis problem, we are given an LTL formula psi over sets of input and output signals, and we synthesize a transducer that realizes psi. One weakness of automated synthesis in practice is that it pays no attention to the quality of the synthesized system. Indeed, the classical setting is Boolean: a computation satisfies a specification or does not satisfy it. Accordingly, while the synthesized system is correct, there is no guarantee about its quality. In recent years, researchers have considered extensions of the classical Boolean setting to a quantitative one. The logic LTL[F] is a multi-valued logic that augments LTL with quality operators. The satisfaction value of an LTL[F] formula is a real value in [0,1], where the higher the value is, the higher is the quality in which the computation satisfies the specification.

Decision problems for LTL become search or optimization problems for LFL[F]. In particular, in the synthesis problem, the goal is to generate a transducer that satisfies the specification in the highest possible quality.

Previous work considered the worst-case setting, where the goal is to maximize the quality of the computation with the minimal quality. We introduce and solve the stochastic setting, where the goal is to generate a transducer that maximizes the expected quality of a computation, subject to a given distribution of the input signals. Thus, rather than being hostile, the environment is assumed to be probabilistic, which corresponds to many realistic settings. We show that the problem is 2EXPTIME-complete, like classical LTL synthesis, and remains so in two extensions we consider: one that maximizes the expected quality while guaranteeing that the minimal quality is, with probability $1$, above a given threshold, and one that allows assumptions on the environment.

by <a href="">Shaull Almagor</a>, <a href="">Orna Kupferman</a> at August 24, 2016 01:30 AM

Improving FPGA resilience through Partial Dynamic Reconfiguration. (arXiv:1608.06559v1 [cs.DC])

This paper explores advances in reconfiguration properties of SRAM-based FPGAs, namely Partial Dynamic Reconfiguration, to improve the resilience of critical systems that take advantage of this technology. Commercial of-the-shelf state-of-the-art FPGA devices use SRAM cells for the configuration memory, which allow an increase in both performance and capacity. The fast access times and unlimited number of writes of this technology, reduces reconfiguration delays and extends the device lifetime but, at the same time, makes them more sensitive to radiation effects, in the form of Single Event Upsets. To overcome this limitation, manufacturers have proposed a few fault tolerant approaches, which rely on space/time redundancy and configuration memory content recovery - scrubbing. In this paper, we first present radiation effects on these devices and investigate the applicability of the most commonly used fault tolerant approaches, and then propose an approach to improve FPGA resilience, through the use of a less intrusive failure prediction scrubbing. It is expected that this approach relieves the system designer from dependability concerns and reduces both time intrusiveness and overall power consumption.

by <a href="">Jose Luis Nunes</a> at August 24, 2016 01:30 AM

Robust Flows over Time: Models and Complexity Results. (arXiv:1608.06520v1 [cs.DM])

We study dynamic network flows with uncertain input data under a robust optimization perspective. In the dynamic maximum flow problem, the goal is to maximize the flow reaching the sink within a given time horizon $T$, while flow requires a certain travel time to traverse an arc.

In our setting, we account for uncertain travel times of flow. We investigate maximum flows over time under the assumption that at most $\Gamma$ travel times may be prolonged simultaneously due to delay. We develop and study a mathematical model for this problem. As the dynamic robust flow problem generalizes the static version, it is NP-hard to compute an optimal flow. However, our dynamic version is considerably more complex than the static version. We show that it is NP-hard to verify feasibility of a given candidate solution. Furthermore, we investigate temporally repeated flows and show that in contrast to the non-robust case (i.e., without uncertainties) they no longer provide optimal solutions for the robust problem, but rather yield a worst case optimality gap of at least $T$. We finally show that for infinite delays, the optimality gap is at most $O(k \log T)$, where $k$ is a newly introduced instance characteristic. The results obtained in this paper yield a first step towards understanding robust dynamic flow problems with uncertain travel times.

by <a href="">Corinna Gottschalk</a>, <a href="">Arie M.C.A. Koster</a>, <a href="">Frauke Liers</a>, <a href="">Britta Peis</a>, <a href="">Daniel Schmand</a>, <a href="">Andreas Wierz</a> at August 24, 2016 01:30 AM

Wireless Sensor Networks: Local Multicast Study. (arXiv:1608.06511v1 [cs.NI])

Wireless sensor networks and Ad-hoc network in the region Multicast (Geocasting) means to deliver the message to all nodes in a given geographical area from the source point. Regional Multicast practical application of the specified area and other regions may be formed to broadcast transmission and location-related business information, extensive advertising, or to send an urgent message. Regional multicast protocol design goal is to ensure messaging and low transport costs. Most have been proposed agreement does not guarantee message delivery, although some other guaranteed messaging protocol has triggered high transmission costs. The goal now is to ensure that research messaging and low-cost local transmission protocol and its algorithm, to promote the development of domain communication. This paper introduces the research background and research results, and proposed to solve the problems and ideas.

by <a href="">Seyed Hossein Ahmadpanah</a>, <a href="">Abdullah Jafari Chashmi</a>, <a href="">Seyede Samaneh Siadatpour</a> at August 24, 2016 01:30 AM

Adaptive Data Collection Mechanisms for Smart Monitoring of Distribution Grids. (arXiv:1608.06510v1 [cs.SY])

Smart Grid systems not only transport electric energy but also information which will be active part of the electricity supply system. This has led to the introduction of intelligent components on all layers of the electrical grid in power generation, transmission, distribution and consumption units. For electric distribution systems, Information from Smart Meters can be utilized to monitor and control the state of the grid. Hence, it is indeed inherent that data from Smart Meters should be collected in a resilient, reliable, secure and timely manner fulfilling all the communication requirements and standards. This paper presents a proposal for smart data collection mechanisms to monitor electrical grids with adaptive smart metering infrastructures. A general overview of a platform is given for testing, evaluating and implementing mechanisms to adapt Smart Meter data aggregation. Three main aspects of adaptiveness of the system are studied, adaptiveness to smart metering application needs, adaptiveness to changing communication network dynamics and adaptiveness to security attacks. Execution of tests will be conducted in real field experimental set-up and in an advanced hardware in the loop test-bed with power and communication co-simulation for validation purposes.

by <a href="">Mohammed S. Kemal</a>, <a href="">Rasmus L. Olsen</a> at August 24, 2016 01:30 AM

Dijkstra Monads for Free. (arXiv:1608.06499v1 [cs.PL])

Dijkstra monads are a means by which a dependent type theory can be enhanced with support for reasoning about effectful code. These specification-level monads computing weakest preconditions, and their closely related counterparts, Hoare monads, provide the basis on which verification tools like F*, Hoare Type Theory (HTT), and Ynot are built. In this paper we show that Dijkstra monads can be derived "for free" by applying a continuation-passing style (CPS) translation to the standard monadic definitions of the underlying computational effects.

Automatically deriving Dijkstra monads provides a correct-by-construction and efficient way of reasoning about user-defined effects in dependent type theories. We demonstrate these ideas in EMF*, a new dependently typed calculus, validating it both by formal proof and via a prototype implementation within F*. Besides equipping F* with a more uniform and extensible effect system, EMF* enables within F* a mixture of intrinsic and extrinsic proofs that was previously impossible.

by <a href="">Danel Ahman</a>, <a href="">Catalin Hritcu</a>, <a href="">Guido Martinez</a>, <a href="">Gordon Plotkin</a>, <a href="">Jonathan Protzenko</a>, <a href="">Aseem Rastogi</a>, <a href="">Nikhil Swamy</a> at August 24, 2016 01:30 AM

Delay Evaluation of OpenFlow Network Based on Queueing Model. (arXiv:1608.06491v1 [cs.DC])

As one of the most popular south-bound protocol of software-defined networking(SDN), OpenFlow decouples the network control from forwarding devices. It offers flexible and scalable functionality for networks. These advantages may cause performance issues since there are performance penalties in terms of packet processing speed. It is important to understand the performance of OpenFlow switches and controllers for its deployments. In this paper we model the packet processing time of OpenFlow switches and controllers. We mainly analyze how the probability of packet-in messages impacts the performance of switches and controllers. Our results show that there is a performance penalty in OpenFlow networks. However, the penalty is not much when probability of packet-in messages is low. This model can be used for a network designer to approximate the performance of her deployments.

by <a href="">Zhihao Shang</a>, <a href="">Katinka Wolter</a> at August 24, 2016 01:30 AM

Multivariate Cryptography with Mappings of Discrete Logarithms and Polynomials. (arXiv:1608.06472v1 [cs.CR])

In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate mappings. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multivariate equations involving polynomial or exponential mappings. The method is a general purpose utility for most data encryption, digital certificate or digital signature applications.

by <a href="">Duggirala Meher Krishna</a>, <a href="">Duggirala Ravi</a> at August 24, 2016 01:30 AM

Warehousing Complex Archaeological Objects. (arXiv:1608.06469v1 [cs.DB])

Data organization is a difficult and essential component in cultural heritage applications. Over the years, a great amount of archaeological ceramic data have been created and processed by various methods and devices. Such ceramic data are stored in databases that concur to increase the amount of available information rapidly. However , such databases typically focus on one type of ceramic descriptors, e.g., qualitative textual descriptions, petrographic or chemical analysis results, and do not interoperate. Thus, research involving archaeological ceramics cannot easily take advantage of combining all these types of information. In this application paper, we introduce an evolution of the Ceramom database that includes text descriptors of archaeological features, chemical analysis results, and various images, including petrographic and fabric images. To illustrate what new analyses are permitted by such a database, we source it to a data warehouse and present a sample on-line analysis processing (OLAP) scenario to gain deep understanding of ceramic context.

by <a href="">Ayb&#xfc;k&#xeb; Ozt&#xfc;rk</a> (ERIC,ARAR), <a href="">Louis Eyango</a> (ARAR), <a href="">Sylvie Yona Waksman</a> (ARAR), <a href="">St&#xe9;phane Lallich</a> (ERIC), <a href="">J&#xe9;r&#xf4;me Darmont</a> (ERIC) at August 24, 2016 01:30 AM

RELARM: A rating model based on relative PCA attributes and k-means clustering. (arXiv:1608.06416v1 [q-fin.CP])

Following widely used in visual recognition concept of relative attributes, the article establishes definition of the relative PCA attributes for a class of objects defined by vectors of their parameters. A new rating model (RELARM) is built using relative PCA attribute ranking functions for rating object description and k-means clustering algorithm. Rating assignment of each rating object to a rating category is derived as a result of cluster centers projection on the specially selected rating vector. Empirical study has shown a high level of approximation to the existing S & P, Moody's and Fitch ratings.

by <a href="">Elnura Irmatova</a> at August 24, 2016 01:30 AM

Learning to Communicate: Channel Auto-encoders, Domain Specific Regularizers, and Attention. (arXiv:1608.06409v1 [cs.LG])

We address the problem of learning efficient and adaptive ways to communicate binary information over an impaired channel. We treat the problem as reconstruction optimization through impairment layers in a channel autoencoder and introduce several new domain-specific regularizing layers to emulate common channel impairments. We also apply a radio transformer network based attention model on the input of the decoder to help recover canonical signal representations. We demonstrate some promising initial capacity results from this architecture and address several remaining challenges before such a system could become practical.

by <a href="">Timothy J O&#x27;Shea</a>, <a href="">Kiran Karra</a>, <a href="">T. Charles Clancy</a> at August 24, 2016 01:30 AM

Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games. (arXiv:1608.06403v1 [cs.GT])

Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves $O(T^{2/3}\log T)$ distribution independent and $O(\log T)$ distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy Exploitation (PEGE) algorithmic framework for the problem. Different algorithms within the framework achieve $O(T^{2/3}\sqrt{\log T})$ distribution independent and $O(\log^2 T)$ distribution dependent regret respectively. Crucially, our framework needs only the simpler "argmax" oracle from GCB and the distribution dependent regret does not require existence of a unique optimal action. Our second contribution is another algorithm, PEGE2, which combines gap estimation with a PEGE algorithm, to achieve an $O(\log T)$ regret bound, matching the GCB guarantee but removing the dependence on size of the learner's action space. However, like GCB, PEGE2 requires access to both offline oracles and the existence of a unique optimal action. Finally, we discuss how our algorithm can be efficiently applied to a CPM problem of practical interest: namely, online ranking with feedback at the top.

by <a href="">Sougata Chaudhuri</a>, <a href="">Ambuj Tewari</a> at August 24, 2016 01:30 AM

Formalization of Fault Trees in Higher-order Logic: A Deep Embedding Approach. (arXiv:1608.06392v1 [cs.LO])

Fault Tree (FT) is a standard failure modeling technique that has been extensively used to predict reliability, availability and safety of many complex engineering systems. In order to facilitate the formal analysis of FT based analyses, a higher-order-logic formalization of FTs has been recently proposed. However, this formalization is quite limited in terms of handling large systems and transformation of FT models into their corresponding Reliability Block Diagram (RBD) structures, i.e., a frequently used transformation in reliability and availability analyses. In order to overcome these limitations, we present a deep embedding based formalization of FTs. In particular, the paper presents a formalization of AND, OR and NOT FT gates, which are in turn used to formalize other commonly used FT gates, i.e., NAND, NOR, XOR, Inhibit, Comparator and majority Voting, and the formal verification of their failure probability expressions. For illustration purposes, we present a formal failure analysis of a communication gateway software for the next generation air traffic management system.

by <a href="">Waqar Ahmed</a>, <a href="">Osman Hasan</a> at August 24, 2016 01:30 AM

A New Parallelization Method for K-means. (arXiv:1608.06347v1 [cs.DC])

K-means is a popular clustering method used in data mining area. To work with large datasets, researchers propose PKMeans, which is a parallel k-means on MapReduce. However, the existing k-means parallelization methods including PKMeans have many limitations. It can't finish all its iterations in one MapReduce job, so it has to repeat cascading MapReduce jobs in a loop until convergence. On the most popular MapReduce platform, Hadoop, every MapReduce job introduces significant I/O overheads and extra execution time at stages of job start-up and shuffling. Even worse, it has been proved that in the worst case, k-means needs MapReduce jobs to converge, where n is the number of data instances, which means huge overheads for large datasets. Additionally, in PKMeans, at most one reducer can be assigned to and update each centroid, so PKMeans can only make use of limited number of parallel reducers. In this paper, we propose an improved parallel method for k-means, IPKMeans, which has a parallel preprocessing stage using k-d tree and can finish k-means in one single MapReduce job with much more reducers working in parallel and lower I/O overheads than PKMeans and has a fast post-processing stage generating the final result. In our method, both k-d tree and the new improved parallel k-means are implemented using MapReduce and tested on Hadoop. Our experiments show that with same dataset and initial centroids, our method has up to 2/3 lower I/O overheads and consumes less amount of time than PKMeans to get a very close clustering result.

by <a href="">Shikai Jin</a>, <a href="">Yuxuan Cui</a>, <a href="">Chunli Yu</a> at August 24, 2016 01:30 AM

Job Placement Advisor Based on Turnaround Predictions for HPC Hybrid Clouds. (arXiv:1608.06310v1 [cs.DC])

Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect negatively response times inside HPC cloud environments. This paper introduces a tool to make job placement decisions in HPC hybrid cloud environments taking into account the inaccuracy of execution and waiting time predictions. We used job traces from real supercomputing centers to run our experiments, and compared the performance between environments using real speedup curves. We also extended a state-of-the-art machine learning based predictor to work with data from the cluster scheduler. Our main findings are: (i) depending on workload characteristics, there is a turning point where predictions should be disregarded in favor of a more conservative decision to minimize job turnaround times and (ii) scheduler data plays a key role in improving predictions generated with machine learning using job trace data---our experiments showed around 20% prediction accuracy improvements.

by <a href="">Renato L. F. Cunha</a>, <a href="">Eduardo R. Rodrigues</a>, <a href="">Leonardo P. Tizzei</a>, <a href="">Marco A. S. Netto</a> (IBM Research) at August 24, 2016 01:30 AM


Relationship between PP and PH

Toda's theorem says that $PH \subset P^{PP}$. Does this imply any relationship between $PH$ and $PP$ that does not involve oracles? Does it imply either that $PH \subset PP$ or that $PP \subset PH$? Is it known or conjectured whether either of those hold?

by tparker at August 24, 2016 01:28 AM



How to upload datasets and download updated weights file on AWS EC2, Tensorflow?

I learned how to use AWS, like creating instances, stopping and terminating them by following this tutorial. But I don't know how to upload datasets on EC2 and how to download updated variables(weights) file from EC2 for later use?

by shader at August 24, 2016 01:20 AM


How to prevent overflow and underflow in the Euclidean distance and Mahalanobis distance

I was working in my project when I was struck by the question of whether it would be necessary, or at least cautious, prevent overflow and underflow in the calculation of these two distances.

I remembered that there is an implementation of the calculation of the hypotenuse to prevent this. Most languages ​​implementers, and is known for Hypot

The calculation of the Euclidean distance remains the same "pattern" and I thought that if Hypot() controls the overflow and underflow should also beware of the Euclidean distance. I've disappointed to note that the language we use, and others, do not control the overflow and underflow for calculating distance. Will not worth spend this "additional effort"?

I did a searchs and came to a question in Math.StackExchange

There is no definitive answer to this issue and is somewhat old. The first thing I wondered is: Will okay? I think that yes, seeing that is a generalization of the same procedure that performs Hypot().

I decided to extrapolate this concept to the Mahalanobis distance. The original is as follows:

$$D_M(X,Y,L) = \sqrt{\sum_{i=1}^{n}\left(\frac{X_i-Y_i}{L_i}\right)^2}$$

Since $L$ is the vector of eigenvalues.

And my proposal is this:

$$D_M(X,Y,L) = C\sqrt{\sum_{i=1}^{n}\left(\frac{X_i-Y_i}{L_i }\frac{1}{C}\right)^2}$$

That is the same to:

$$D_M(X,Y,L) = C\sqrt{\sum_{i=1}^{n}\left(\frac{X_i-Y_i}{L_i C}\right)^2}$$

And $C$ is the max value from the $|(X_i-Y_i)/L_i|$:

$$C = \max_{i}\left(|\frac{X_i-Y_i}{L_i}|\right)$$

Is it okay?

by Delphius at August 24, 2016 01:19 AM

Algorithm for cycle-detecting comparison

I am looking for an algorithm that I can use to compare nested and potentially recursive data structures, for example to implement the Scheme equal? function. equal? recursively compares two objects for equality and properly handles cycles. Specifically, the algorithm needs to return true iff the (possibly infinite) unfoldings of the graphs would be equal, e.g.

(letrec ((a (cons 1 (cons 2 a)))
         (b (cons 1 (cons 2 (cons 1 (cons 2 b))))
  (equal? a b))

is true because a and b are both cyclic lists that repeat the sequence (1 2) infinitely.

Destructive modification of traversed nodes is thread-unsafe, requires a spare bit in object headers, and requires a separate traversal to reset the bit. Using a hash table to store object addresses is not safe in the presence of a moving garbage collector unless the GC is blocked for the duration of the traversal (so the operation cannot be implemented except as a primitive).

by Demetri at August 24, 2016 01:16 AM


Trying to adapt TensorFlow's MNIST example gives NAN predictions

I'm playing with TensorFlow, using the 'MNIST for beginners' example (initial code here). I've made some slight adaptions:

mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

sess = tf.InteractiveSession()

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

fake_images = mnist.train.images.tolist() 

# Train
for i in range(10):
  batch_xs, batch_ys = fake_images, mnist.train.labels{x: batch_xs, y_: batch_ys})

# Test trained model
print(y.eval({x: mnist.test.images}))

Specifically, I'm only running the training step 10 times (I'm not concerned about accuracy, more about speed). I'm also running it on all the data at once (for simplicity). At the end, I'm outputting the predictions TF is making, instead of the accuracy percentage. Here's (some of) the output of the above code:

 [  1.08577311e-02   7.29394853e-01   5.02395593e-02 ...,   2.74689011e-02
    4.43389975e-02   2.32385024e-02]
 [  2.95746652e-03   1.30554764e-02   1.39354384e-02 ...,   9.16484520e-02
    9.70732421e-02   2.57733971e-01]
 [  5.94450533e-02   1.36338845e-01   5.22132218e-02 ...,   6.91468120e-02
    1.95634082e-01   4.83607128e-02]
 [  4.46179360e-02   6.66685810e-04   3.84704918e-02 ...,   6.51754031e-04
    2.46591796e-03   3.10819712e-03]]

Which appears to be the probabilities TF is assigning to each of the possibilities (0-9). All is well with the world.

My main goal is to adapt this to another use, but first I'd like to make sure I can give it other data. This is what I've tried:

fake_images = np.random.rand(55000, 784).astype('float32').tolist()

Which, as I understand it, should generate an array of random junk that is structurally the same as the data from MNIST. But making the change above, here's what I get:

[[ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]]

Which is clearly much less useful. Looking at each option (mnist.train.images and the np.random.rand option), it looks like both are a list of lists of floats.

Why won't TensorFlow accept this array? Is it simply complaining because it recognizes that there's no way it can learn from a bunch of random data? I would expect not, but I've been wrong before.

by undo at August 24, 2016 01:15 AM


Planet Theory

Online bin packing with cardinality constraints resolved

Authors: János Balogh, József Békési, György Dósa, Leah Epstein, Asaf Levin
Download: PDF
Abstract: Cardinality constrained bin packing or bin packing with cardinality constraints is a basic bin packing problem. In the online version with the parameter k \geq 2, items having sizes in (0,1] associated with them are presented one by one to be packed into unit capacity bins, such that the capacities of bins are not exceeded, and no bin receives more than k items. We resolve the online problem in the sense that we prove a lower bound of 2 on the overall asymptotic competitive ratio. This closes this long standing open problem, since an algorithm of an absolute competitive ratio 2 is known. Additionally, we significantly improve the known lower bounds on the asymptotic competitive ratio for every specific value of k. The novelty of our constructions is based on full adaptivity that creates large gaps between item sizes. Thus, our lower bound inputs do not follow the common practice for online bin packing problems of having a known in advance input consisting of batches for which the algorithm needs to be competitive on every prefix of the input.

August 24, 2016 01:03 AM

On Low-High Orders of Directed Graphs: Incremental Algorithms and Applications

Authors: Loukas Georgiadis, Aikaterini Karanasiou, Giannis Konstantinos, Luigi Laura
Download: PDF
Abstract: A flow graph $G=(V,E,s)$ is a directed graph with a distinguished start vertex $s$. The dominator tree $D$ of $G$ is a tree rooted at $s$, such that a vertex $v$ is an ancestor of a vertex $w$ if and only if all paths from $s$ to $w$ include $v$. The dominator tree is a central tool in program optimization and code generation and has many applications in other diverse areas including constraint programming, circuit testing, biology, and in algorithms for graph connectivity problems. A low-high order of $G$ is a preorder $\delta$ of $D$ that certifies the correctness of $D$ and has further applications in connectivity and path-determination problems. In this paper, we first consider how to maintain efficiently a low-high order of a flow graph incrementally under edge insertions. We present algorithms that run in $O(mn)$ total time for a sequence of $m$ edge insertions in an initially empty flow graph with $n$ vertices.These immediately provide the first incremental certifying algorithms for maintaining the dominator tree in $O(mn)$ total time, and also imply incremental algorithms for other problems. Hence, we provide a substantial improvement over the $O(m^2)$ simple-minded algorithms, which recompute the solution from scratch after each edge insertion. We also show how to apply low-high orders to obtain a linear-time $2$-approximation algorithm for the smallest $2$-vertex-connected spanning subgraph problem (2VCSS). Finally, we present efficient implementations of our new algorithms for the incremental low-high and 2VCSS problems and conduct an extensive experimental study on real-world graphs taken from a variety of application areas. The experimental results show that our algorithms perform very well in practice.

August 24, 2016 01:02 AM

Fast binary embeddings with Gaussian circulant matrices: improved bounds

Authors: Sjoerd Dirksen, Alexander Stollenwerk
Download: PDF
Abstract: We consider the problem of encoding a finite set of vectors into a small number of bits while approximately retaining information on the angular distances between the vectors. By deriving improved variance bounds related to binary Gaussian circulant embeddings, we largely fix a gap in the proof of the best known fast binary embedding method. Our bounds also show that well-spreadness assumptions on the data vectors, which were needed in earlier work on variance bounds, are unnecessary. In addition, we propose a new binary embedding with a faster running time on sparse data.

August 24, 2016 01:02 AM

A PTAS for the Steiner Forest Problem in Doubling Metrics

Authors: T-H. Hubert Chan, Shuguang Hu, Shaofeng H.-C. Jiang
Download: PDF
Abstract: We achieve a (randomized) polynomial-time approximation scheme (PTAS) for the Steiner Forest Problem in doubling metrics. Before our work, a PTAS is given only for the Euclidean plane in [FOCS 2008: Borradaile, Klein and Mathieu]. Our PTAS also shares similarities with the dynamic programming for sparse instances used in [STOC 2012: Bartal, Gottlieb and Krauthgamer] and [SODA 2016: Chan and Jiang]. However, extending previous approaches requires overcoming several non-trivial hurdles, and we make the following technical contributions.

(1) We prove a technical lemma showing that Steiner points have to be "near" the terminals in an optimal Steiner tree. This enables us to define a heuristic to estimate the local behavior of the optimal solution, even though the Steiner points are unknown in advance. This lemma also generalizes previous results in the Euclidean plane, and may be of independent interest for related problems involving Steiner points.

(2) We develop a novel algorithmic technique known as "adaptive cells" to overcome the difficulty of keeping track of multiple components in a solution. Our idea is based on but significantly different from the previously proposed "uniform cells" in the FOCS 2008 paper, whose techniques cannot be readily applied to doubling metrics.

August 24, 2016 01:02 AM

Quantum Communication Complexity of Distributed Set Joins

Authors: Stacey Jeffery, François Le Gall
Download: PDF
Abstract: Computing set joins of two inputs is a common task in database theory. Recently, Van Gucht, Williams, Woodruff and Zhang [PODS 2015] considered the complexity of such problems in the natural model of (classical) two-party communication complexity and obtained tight bounds for the complexity of several important distributed set joins.

In this paper we initiate the study of the *quantum* communication complexity of distributed set joins. We design a quantum protocol for distributed Boolean matrix multiplication, which corresponds to computing the composition join of two databases, showing that the product of two $n\times n$ Boolean matrices, each owned by one of two respective parties, can be computed with $\widetilde{O}(\sqrt{n}\ell^{3/4})$ qubits of communication, where $\ell$ denotes the number of non-zero entries of the product. Since Van Gucht et al. showed that the classical communication complexity of this problem is $\widetilde{\Theta}(n\sqrt{\ell})$, our quantum algorithm outperforms classical protocols whenever the output matrix is sparse. We also show a quantum lower bound and a matching classical upper bound on the communication complexity of distributed matrix multiplication over $\mathbb{F}_2$.

Besides their applications to database theory, the communication complexity of set joins is interesting due to its connections to direct product theorems in communication complexity. In this work we also introduce a notion of *all-pairs* product theorem, and relate this notion to standard direct product theorems in communication complexity.

August 24, 2016 01:01 AM

Privacy Amplification Against Active Quantum Adversaries

Authors: Gil Cohen, Thomas Vidick
Download: PDF
Abstract: Privacy amplification is the task by which two cooperating parties transform a shared weak secret, about which an eavesdropper may have side information, into a uniformly random string uncorrelated from the eavesdropper. Privacy amplification against passive adversaries, where it is assumed that the communication is over a public but authenticated channel, can be achieved in the presence of classical as well as quantum side information by a single-message protocol based on strong extractors.

In 2009 Dodis and Wichs devised a two-message protocol to achieve privacy amplification against active adversaries, where the public communication channel is no longer assumed to be authenticated, through the use of a strengthening of strong extractors called non-malleable extractors which they introduced. Dodis and Wichs only analyzed the case of classical side information.

We consider the task of privacy amplification against active adversaries with quantum side information. Our main result is showing that the Dodis-Wichs protocol remains secure in this scenario provided its main building block, the non-malleable extractor, satisfies a notion of quantum-proof non-malleability which we introduce. We show that an adaptation of a recent construction of non-malleable extractors due to Chattopadhyay et al. is quantum proof, thereby providing the first protocol for privacy amplification that is secure against active quantum adversaries. Our protocol is quantitatively comparable to the near-optimal protocols known in the classical setting.

August 24, 2016 01:01 AM

Communication complexity of approximate Nash equilibria

Authors: Yakov Babichenko, Aviad Rubinstein
Download: PDF
Abstract: For a constant $\epsilon$, we prove a poly(N) lower bound on the communication complexity of $\epsilon$-Nash equilibrium in two-player NxN games. For n-player binary-action games we prove an exp(n) lower bound for the communication complexity of $(\epsilon,\epsilon)$-weak approximate Nash equilibrium, which is a profile of mixed actions such that at least $(1-\epsilon)$-fraction of the players are $\epsilon$-best replying.

August 24, 2016 01:00 AM


Up and Down days in GBPUSD and a Filter

I want to study if the odds of an up or down day in a forex pairs is 50-50. I just count the total number of up and down days in X years and compare it with the total days. The results are very similar to a 50-50 chance. Now I want to see if by applying an EMA200 filter you have more probabilities of an up day if the closing price is above the EMA, and vice-versa for a closing price below the EMA. The results show that it´s more probable to obtain an up day if the closing price is above the EMA. The question is: Does the test have any bias? I am worried that the results aren't true because of a bias in the test. Because the EMA depends on the price, maybe it´s just obvious that there are most up days if the price is above the EMA.

$ema = EMA(Close,200); 
foreach (NewDay) { $Totaldays++; 
if (Today(Close) > Today(Open)){ $Totalup++; } 
if (Today(Close) < Today(Open)){ $Totaldown++; } 
if (Today(Close) == Today(Open)){ $Totaldojis++; } 
    if (Today(Close) > Today($ema)){ $Totalabove++; 
        if (Today(Close) > Today(Open)){ $Upabove++; } 
        if (Today(Close) < Today(Open)){ $Downabove++; } 
        if (Today(Close) == Today(Open)){ $Dojisabove++; } 
    if (Today(Close) < Today($ema)){ $Totalbelow++; 
        if (Today(Close) > Today(Open)){ $Upbelow++; } 
        if (Today(Close) < Today(Open)){ $Downbelow++; } 
        if (Today(Close) == Today(Open)){ $Dojisbelow++; } 


by tn240 at August 24, 2016 12:44 AM


How do I set TensorFlow RNN state when state_is_tuple=True?

I have written an RNN language model using TensorFlow. The model is implemented as an RNN class. The graph structure is built in the constructor, while RNN.train and RNN.test methods run it.

I want to be able to reset the RNN state when I move to a new document in the training set, or when I want to run a validation set during training. I do this by managing the state inside the training loop, passing it into the graph via a feed dictionary.

In the constructor I define the the RNN like so

    cell = tf.nn.rnn_cell.LSTMCell(hidden_units)
    rnn_layers = tf.nn.rnn_cell.MultiRNNCell([cell] * layers)
    self.reset_state = rnn_layers.zero_state(batch_size, dtype=tf.float32)
    self.state = tf.placeholder(tf.float32, self.reset_state.get_shape(), "state")
    self.outputs, self.next_state = tf.nn.dynamic_rnn(rnn_layers, self.embedded_input, time_major=True,

The training loop looks like this

 for document in document:
     state =
     for x, y in document:
          _, state =[self.train_step, self.next_state], 
                                 feed_dict={self.x:x, self.y:y, self.state:state})

x and y are batches of training data in a document. The idea is that I pass the latest state along after each batch, except when I start a new document, when I zero out the state by running self.reset_state.

This all works. Now I want to change my RNN to use the recommended state_is_tuple=True. However, I don't know how to pass the more complicated LSTM state object via a feed dictionary. Also I don't know what arguments to pass to the self.state = tf.placeholder(...) line in my constructor.

What is the correct strategy here? There still isn't much example code or documentation for dynamic_rnn available.

TensorFlow issue 2695 appears relevant, but I haven't fully digested it.

by W.P. McNeill at August 24, 2016 12:41 AM



Cartesian product of multiple arrays in JavaScript

How would you implement the Cartesian product of multiple arrays in JavaScript?

As an example,

cartesian([1,2],[10,20],[100,200,300]) //should be
// [[1,10,100],[1,10,200],[1,10,300],[2,10,100],[2,10,200]...]

by viebel at August 24, 2016 12:22 AM


The importance of the language semantics for code generation and frameworks for code generation in model-driven development

I am implementing worflow where the code in industrial programming languages (JavaScript and Java) should be generated from the formal (formally verified) expressions (from ontologies as objects and rule formulas as behaviors). What is the best pratice for such code generation? Are some frameworks available for this?

In my opinion the semantics of the programming language is required. And I should be able to do the code generation in two steps: 1) translate my formal epxressions into semantic expressions of the target language; 2) translate semantic expressions into executable code. In reality I can not find any work that connects semantics of programming language with the code generation.

Is there special kind of semantics of programming languages that is usable not only for the analysis of the programs but also for the generation of the programs?

My guess is that it should be really useful approach for generating formally verified code but I can not find research work about this. Are there trends of better keywords avilable for this?

Maybe - more relevant question is - what kind of compilers/translators the Model Driven Development tools use for the generation of source code (platform dependent code) and how semantics of programming language can be used for the construction of such compilers?

Note added. There already is complete unifying (denotational and operational) semantics of Java, JavaScript and other industrial programmin languages in the K framework. So - this is more question about application of K framework for code generation is that is possible at all?

by TomR at August 24, 2016 12:13 AM


How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository?

I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.

I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script.

Here is a simplified example:

import numpy as np

def loglikelihoodloss(y_hat, y_true):
    prob = 1.0 / (1.0 + np.exp(-y_hat))
    grad = prob - y_true
    hess = prob * (1.0 - prob)
    return grad, hess

y_hat = np.array([1.80087972, -1.82414818, -1.82414818,  1.80087972, -2.08465433,
                  -1.82414818, -1.82414818,  1.80087972, -1.82414818, -1.82414818])
y_true = np.array([1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.])

loglikelihoodloss(y_hat, y_true)

The log loss function is the sum of enter image description here where enter image description here.

The gradient (with respect to p) is then enter image description here however in the code its enter image description here.

Likewise the second derivative (with respect to p) is enter image description here however in the code it is enter image description here.

How are the equations equal?

by Dave at August 24, 2016 12:01 AM

HN Daily

Planet Theory

Stack and Queue Layouts via Layered Separators

Authors: Vida Dujmović, Fabrizio Frati
Download: PDF
Abstract: It is known that every proper minor-closed class of graphs has bounded stack-number (a.k.a. book thickness and page number). While this includes notable graph families such as planar graphs and graphs of bounded genus, many other graph families are not closed under taking minors. For fixed $g$ and $k$, we show that every $n$-vertex graph that can be embedded on a surface of genus $g$ with at most $k$ crossings per edge has stack-number $\mathcal{O}(\log n)$; this includes $k$-planar graphs. The previously best known bound for the stack-number of these families was $\mathcal{O}(\sqrt{n})$, except in the case of $1$-planar graphs. Analogous results are proved for map graphs that can be embedded on a surface of fixed genus. None of these families is closed under taking minors. The main ingredient in the proof of these results is a construction proving that $n$-vertex graphs that admit constant layered separators have $\mathcal{O}(\log n)$ stack-number.

August 24, 2016 12:00 AM

August 23, 2016


How do I pass a scalar via a TensorFlow feed dictionary

My TensorFlow model uses tf.random_uniform to initialize a variable. I would like to specify the range when I begin training, so I created a placeholder for the initialization value.

init = tf.placeholder(tf.float32, name="init")
v = tf.Variable(tf.random_uniform((100, 300), -init, init), dtype=tf.float32)
initialize = tf.initialize_all_variables()

I initialize variables at the start of training like so., feed_dict={init: 0.5})

This gives me the following error:

ValueError: initial_value must have a shape specified: Tensor("Embedding/random_uniform:0", dtype=float32)

I cannot figure out the correct shape parameter to pass to tf.placeholder. I would think for a scalar I should do init = tf.placeholder(tf.float32, shape=0, name="init") but this gives the following error:

ValueError: Incompatible shapes for broadcasting: (100, 300) and (0,)

If I replace init with the literal value 0.5 in the call to tf.random_uniform it works.

How do I pass this scalar initial value via the feed dictionary?

by W.P. McNeill at August 23, 2016 11:59 PM

Ansible: given a list of ints in a variable, define a second list in which each element is incremented

Let's assume that we have an Ansible variable that is a list_of_ints.

I want to define an incremented_list, whose elements are obtained incrementing by a fixed amount the elements of the first list.

For example, if this is the first variable:

# file: somerole/vars/main.yml

  - 1
  - 7
  - 8

assuming an increment of 100, the desired second list would have this content:

  - 101
  - 107
  - 108

I was thinking of something on the lines of:

incremented_list: "{{ list_of_ints | map('add', 100) | list }}"

Sadly, Ansible has custom filters for logarithms or powers, but not for basic arithmetic, so I can easily calculate the log10 of those numbers, but not increment them.

Any ideas, apart from a pull request on ?

by muxator at August 23, 2016 11:28 PM


Back propagation in neural networks

I just finished watching these 3 Coursera videos on back propagation in neural networks. I get the idea of what we're trying to do, but I don't get how we achieve that by calculating error in each step as weight * cascaded error (eg. the formula at the top right of screen at 12:07 in the linked video). Let's say we start off with all the weights (theta) at zero. Wouldn't back propagation always calculate 0 error for everything, causing nothing to change?

by Atte Juvonen at August 23, 2016 11:28 PM

Lambda the Ultimate Forum

Whither FRP?

hi, I was re-reading an LtU blast from the past about FRP and the discussions there made me think to ask this here community to post some concise updates on how FRP research has been going of late. In case any of you in that field have so much free time.

August 23, 2016 11:02 PM



Fortune 1000 companies: which are public utilities?

I am reviewing the Fortune 1000 list of companies. Is there a reliable way using information from EDGAR filings to determine which of these companies are primarily acting as regulated public utilities?

by Full Decent at August 23, 2016 10:18 PM

Fixed Income Ex Ante Tracking Error

Anyone know a good source that walks through how to calculate Ex Ante tracking error for a fixed income portfolio?

by Joseph Roberts at August 23, 2016 09:52 PM


Labeling data for neural net training

Does anyone know of or have a good tool for labeling image data to be used in training a DNN?

Specifically labeling 2 points in an image, like upperLeftCorner and lowerRightCorner, which then calculates a bouding box around the specified object. That's just an example but I would like to be able to follow the MSCoco data format.


by Anthony Ryan at August 23, 2016 09:36 PM

Lambda the Ultimate Forum

language handling of memory and other resource failures

My idea here is to introduce a place to discuss ideas about handling memory exhaustion, or related resource limit management. The goal is to have something interesting and useful to talk about, informing future designs of programming language semantics or implementation. Thoughtful new solutions are more on topic than anecdotes about old problems. (Feel free to tell an anecdote if followed by analysis of a workable nice solution.) Funny failure stories are not very useful.

Worst case scenarios are also of interest: situations that would very hard to handle nicely, as test cases for evaluating planned solutions. For example, you might be able to think of an app that would behave badly under a given resource management regime. This resembles thinking of counter examples, but with emphasis on pathology instead of contradiction.

In another discussion topic, Keean Schupke argued the idea failure due to out-of-memory is an effect, while others suggested it was semantically more out-of-scope than an effect in some languages. I am less interested in what is an effect, and more interested in how to handle problems. (The concept is on topic, when focus is what to do about it. Definitions without use cases seem adrift of practical concerns.)

Relevant questions include: After partial failure, what does code still running do about it? How is it presented semantically? Can failed things be cleaned up without poisoning the survivors afterward? How are monitoring, cleanup, and recovery done efficiently with predictable quality? How do PL semantics help a dev plan and organize system behavior after resource failures, especially memory?

August 23, 2016 09:31 PM

Planet Theory

TR16-132 | On the Sensitivity Conjecture for Read-k Formulas | Mitali Bafna, Satyanarayana V. Lokam, Sébastien Tavenas, Ameya Velingker

Various combinatorial/algebraic parameters are used to quantify the complexity of a Boolean function. Among them, sensitivity is one of the simplest and block sensitivity is one of the most useful. Nisan (1989) and Nisan and Szegedy (1991) showed that block sensitivity and several other parameters, such as certificate complexity, decision tree depth, and degree over R, are all polynomially related to one another. The sensitivity conjecture states that there is also a polynomial relationship between sensitivity and block sensitivity, thus supplying the "missing link". Since its introduction in 1991, the sensitivity conjecture has remained a challenging open question in the study of Boolean functions. One natural approach is to prove it for special classes of functions. For instance, the conjecture is known to be true for monotone functions, symmetric functions, and functions describing graph properties. In this paper, we consider the conjecture for Boolean functions computable by read-k formulas. A read-k formula is a tree in which each variable appears at most k times among the leaves and has Boolean gates at its internal nodes. We show that the sensitivity conjecture holds for read-once formulas with gates computing symmetric functions. We next consider regular formulas with OR and AND gates. A formula is regular if it is a leveled tree with all gates at a given level having the same fan-in and computing the same function. We prove the sensitivity conjecture for constant depth regular read-k formulas for constant k.

August 23, 2016 09:24 PM



Is that possible to use optimization toolbox for gradient boosting model?

I have a decision stump boosting model, here is the essential part, where for a squared loss function, we are fitting a new model to the residuals.

fit=DecisionStump(y~., d)

for (i in 2:Niter){

  # adjust objective and rebuild data frame

  # fit base learner to adjusted data

  # update learner

My question: is that possible to rewrite the function that we can use R optimization tool box such as something like

  f<-function(x) x^2
  g<-function(x) 2*x

by hxd1011 at August 23, 2016 08:54 PM


on "On the cruelty of really teaching computing science"

Dijkstra, in his essay On the cruelty of really teaching computing science, makes the following proposal for an introductory programming course:

On the one hand, we teach what looks like the predicate calculus, but we do it very differently from the philosophers. In order to train the novice programmer in the manipulation of uninterpreted formulae, we teach it more as boolean algebra, familiarizing the student with all algebraic properties of the logical connectives. To further sever the links to intuition, we rename the values {true, false} of the boolean domain as {black, white}.

On the other hand, we teach a simple, clean, imperative programming language, with a skip and a multiple assignment as basic statements, with a block structure for local variables, the semicolon as operator for statement composition, a nice alternative construct, a nice repetition and, if so desired, a procedure call. To this we add a minimum of data types, say booleans, integers, characters and strings. The essential thing is that, for whatever we introduce, the corresponding semantics is defined by the proof rules that go with it.

Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs.

He emphasises that this is a serious proposal, and outlines various possible objections, including that his idea is "utterly unrealistic" and "far too difficult."

But that kite won't fly either for the postulate has been proven wrong: since the early 80's, such an introductory programming course has successfully been given to hundreds of college freshmen each year. [Because, in my experience, saying this once does not suffice, the previous sentence should be repeated at least another two times.]

Which course is Dijkstra referring to, and is there any other literature available that discusses it?

The essay appeared in 1988 when Dijkstra was at the University of Texas at Austin, which is probably a clue -- they host the Dijkstra archive but it is huge, and I'm particularly interested in hearing from others about this course.

I don't want to discuss whether Dijkstra's idea is good or realistic here. I considered posting this on or but settled on here because a) a community of educators might be more likely to have someone who can answer easily, and b) Dijkstra himself emphasises that his course is "primarily a course in formal mathematics." Feel free to flag for migration if you disagree.

by m_t_ at August 23, 2016 08:28 PM

Homography from 3D plane to plane parallel to image plane

I have an image in which there is a calibration target (known geometry) in a scene (let's say a simple 2" x 2" square lying on a table). I would like to perform a perspective transformation so that resulting image is an orthogonal view of the table (as if the camera axis was parallel with the table normal). The general procedure for computing a homography is from a general plane to a different general plane where at least 4 correspondences are known in two images of the same scene. In this case where I only have one image, is the correct thing to do to simply "make up" a plane and force the correspondences to some arbitrary position on that plane? For example, in this this situation I would simply make correspondences between the 4 detected corners (A,B,C,D) in the image and four points of my choosing (which essentially just define the pixel->real world scale. For example, I could choose A' = (0,0), B' = (20,20), C' = (0,20), D' = (20,0) to indicate in the resulting image there are 10 pixels per inch. Of course I could choose any scale here, and I could also choose any position for the square target to land in the output (i.e. A' = (100,100), B' = (120,120), C' = (100,120), D' = (120,100) ).

Is this the "correct" way to do this? Is there a better way to compute a projective transform that looks directly at a plane defined by a set of points in the image known to be in the plane?

by David Doria at August 23, 2016 08:14 PM


Backpropagation - Neural Networks

How does the output value of step 1 result in the value of "0.582"?

I am looking at an example of the usage of backpropagation in order to have a basic understanding of it. However I fail to understand how the value "0.582" is formed as the output for the example below.

enter image description here


I have attempted Feed-Forward, which has resulted in having the output value of "0.5835...".

Now I am unsure if the example above output value is correct or whether the method I have used is correct.


My FF Calculation

f(x) = 1/1+e^-x

. >

NodeJ = f( 1*W1j+0.4*W2j+0.7*W3j )

NodeJ = f( 1(0.2)+0.4(0.3)+0.7(-0.1) ) = f(0.25)

NodeJ = 1/(1+e^-0.25) = 0.562...


NodeI = f( 1*W1i+0.4*W2i+0.7*W3i )

NodeI = f( 1(0.1)+0.4(-0.1)+0.7(0.2) ) = f(0.25)

NodeI = 1/(1+e^-0.25) = 0.562...


NodeK = f( NodeJ * Wjk + NodeI * Wik)

NodeK = f( 0.562(0.1)+0.562(0.5)) = f(0.3372)

NodeK = 1/(1+e^-0.3372) = 0.5835

Output = 0.5835

by Ron at August 23, 2016 08:12 PM


Why unsafe state not always cause deadlock?

I was reading Operating Systems by Galvin and came across the below line,

Not all unsafe states are deadlock, however. An unsafe state may lead to deadlock

Can someone please explain how deadlock != unsafe state ? I also caught the same line here

If a safe sequence does not exist, then the system is in an unsafe state, which MAY lead to deadlock. ( All safe states are deadlock free, but not all unsafe states lead to deadlocks. )

by vikkyhacks at August 23, 2016 08:12 PM


The output of a softmax isn't supposed to have zeros, right?

I am working on a net in tensorflow which produces a vector which is then passed through a softmax which is my output.

Now I have been testing this and weirdly enough the vector (the one that passed through softmax) has zeros in all coordinate but one.

Based on the softmax's definition with the exponential, I assumed that this wasn't supposed to happen. Is this an error?

EDIT: My vector is 120x160 =192000. All values are float32

by Alperen AYDIN at August 23, 2016 08:01 PM


Coin change problem

I have a homework question as below: There is a currency system that has coins of value v1, v2, ..., vk for some integer k > 1 such that v1 = 1. You have to pay a person V units of money using this currency. Answer the following: (a) (16 points) Let v2 = c 1 , v3 = c 2 , ..., vk = c k−1 for some fixed integer constant c > 1. Design a greedy algorithm that minimises the total number of coins needed to pay V units of money for any given V . Give pseudocode, discuss running time, and give proof of correctness.

I have this intuition that we should pick the maximum valued coin whenever possible. But I can't get anywhere while trying to prove this. I am tring "greedy stays ahead" method.

by Ankit Shubham at August 23, 2016 08:00 PM

Is there research into associative/commutative optimizations?

While playing around with optimization sets in LLVM, it occurs to me that the order in which optimizations are run matters greatly since, in general, A(B(src)) is not equal to B(A(src)) where A and B are some optimization of type source -> source and src is of type source.

Are there optimizations for which that property holds? Are there projects or research that attempt to formalize or otherwise create these types of optimizations?

by oconnor0 at August 23, 2016 07:56 PM


NFLVR and HJM framework

The no-arbitrage HJM drift condition is well know, for the traditional (ELMM: Equivalent Local Martingale Measure) formulation of no-arbitrage. My question is:

is there a known necessary and sufficient condition on HJM models $f_t(T)$ to satisfy the NFLVR (No Free Lunch With Vanishing Risk) condition?

If so what is it and what is a good reference?

by CSA at August 23, 2016 07:49 PM

quantlib python : missing methods?

I'm reading Introduction to Selected Classes of the QuantLib Library I by Dimitri Reiswich and tries to "convert" it to Python. It seems to me that some C++ possibilities aren't available in python. I'm not familiar with SWIG but I guess it's a matter of declaring them in the appropriate *.i files. For instance both these work following the pdf text:

January: either QuantLib::January or QuantLib::Jan

print(ql.Date(12, 12, 2015))
print(ql.Date(12, ql.January, 2015))

But why Jan doesn't work ?

print(ql.Date(12, ql.Jan, 2015))

In the Calendar description the 2 followinng commented lines return an error, browsing through the code I failed at finding them. Would someone be kind enought to point me directions on how to make them available ?

import QuantLib as ql

def calendarTesting():
    frankfurtCal = ql.Germany(ql.Germany.FrankfurtStockExchange)
    saudiArabCal = ql.SaudiArabia()
    myEve = ql.Date(31, 12, 2009)
    print('is BD: {}'.format(frankfurtCal.isBusinessDay(myEve)))
    print('is Holiday: {}'.format(frankfurtCal.isHoliday(myEve)))
    # print('is weekend: {}'.format(saudiArabCal.isWeekend(ql.Saturday)))
    print('is last BD: {}'.format(frankfurtCal.isEndOfMonth(ql.Date(30, 12, 2009))))
    # print('last BD: {}'.format(frankfurtCal.endOfMonth(myEve)))


by euri10 at August 23, 2016 07:41 PM


compare the similarity/overlap of two high dimentional dataset [on hold]

I want to predict whether a client will pay credit card or not with 20 features (such as salary, age, single/married.....). However, I only have train data for good customers (clients who pay credit card). Now I need to predict unknown client will pay credit card or not. Since I don't have train data for bad customer. I don't think I can used a supervised ML algorithm. My guess is I need to used a unsupervised ML algorithm such as KNN clustering to predict by similarity. However, I am a little confused about how to do it. Does anyone know how to solve this? And, is there any python library that i can use (such as scikit learn) and examples of similar problem?

THanks for your help!

by Robin1988 at August 23, 2016 07:39 PM


How can I boot the PC-BSD live DVD-ISO IMAGE directly via GRUB2?

Via the loopback command, GRUB2 allows to directly boot an ISO file.

Now, I've configured the according menuentry to boot the PC-BSD Live DVD ISO, but when I try to boot it, the FreeBSD bootstrap loader outputs:

can't load 'kernel'

Here is the GRUB2 menuentry I currently use:

menuentry "PC-BSD" {
        search --no-floppy --fs-uuid --set root 0d11c28a-7186-43b9-ae33-b4bd351c60ad
        loopback loop /PCBSD9.0-RC1-x64-DVD-live.iso
        kfreebsd (loop)/boot/loader

Does one know how I'd need to amend that in order to be able to boot the PC-BSD live system?

by user569825 at August 23, 2016 07:36 PM


Is always normalizing features standard practice?

It seems like every single machine learning method (perceptron, SVM, etc) warns you about the need to normalize all the features during preprocessing.

Is this always true for all common machine learning methods? Or am I just running into the few that require normalized features.

by Anton at August 23, 2016 07:26 PM

Choose the best cluster partition based on a cost function

I've a string that I'd like to cluster:


I don't know in advance how many clusters I'll get. All I have, is a cost function that can take a clustering and give it a score.

There is also a constraint on the cluster sizes: they must be in a range [a, b]

In my exemple, for a=3 and b=4, all possible clustering are:

    ['AAA', 'BBC', 'CCCC'],
    ['AAA', 'BBCC', 'CCC'],
    ['AAAB', 'BCC', 'CCC'],

Concatenation of each clustering must give the string s

The cost function is something like this

cost(clustering) = alpha*l + beta*e + gamma*d


  • l = variance(cluster_lengths)
  • e = mean(clusters_entropies)
  • d = 1 - nb_characters_in_b_that_are_not_in_a)/size_of_b (for b the consecutive cluster of a)
  • alpha, beta, gamma are weights

This cost function gives a low cost (0) for the best case:

  1. Where all clusters have the same size.
  2. Content inside each cluster is the same.
  3. Consecutive clusters don't have the same content.

Theoretically, the solution is to calculate the cost of all possible compositions for this string and choose the lowest. but It will take too much time.

Is there any clustering algorithme that can find the best clustering according to this cost function in a reasonable time ?

by Ghilas BELHADJ at August 23, 2016 07:13 PM

Determine the Initial Probabilities of an HMM

So I have managed to estimate most of the parameters in a particular Hidden Markov Model (HMM) given the learn dataset. These parameters are: the emission probabilities of the hidden states and the transition matrix $P$ of the Markov chain. I used Gibbs sampling for the learning. Now there is one set of parameters that is still missing that is the initial probabilities $\pi$ (probability distribution of where the chain starts) and I want to deduce it from the learned parameters. How can I do it?

Also, is it true that $\pi$ is the same as the stationary probability distribution of $P$?

by Monster42 at August 23, 2016 07:11 PM


Bellman Ford Algorithm fails to compute shortest path for a directed edge-weighted graph

I was recently understanding shortest path algorithms when I encountered the problem below in the book Algorithms, 4th edition, Robert Sedgewick and Kevin Wayne.

Suppose that we convert an EdgeWeightedGraph into a Directed EdgeWeightedGraph by creating two DirectedEdge objects in the EdgeWeightedDigraph (one in each direction) for each Edge in the EdgeWeightedGraph and then use the Bellman-Ford algorithm. Explain why this approach fails spectacularly.

I've tried many examples on paper, but am unable to find scenarios where the directed graph generated would have new negative cycles in them, simply by converting an edge into two edges in opposite directions. I assume that there were no pre-existing negative cycles in the unweighted undirected edge-weighted graph.

by Shubham Mittal at August 23, 2016 07:00 PM

Time Complexity for repeated element

An integer array of size n contains integer values from the range 0 to n-2. Only one of these integers comes twice in the array and all other comes only once. An algorithm has been designed to find this repeated integer from the given array. If this algorithm focuses on using minimum number of comparison operations, what will be the order of comparison operations used by it to get the correct answer?

by sourav at August 23, 2016 06:59 PM

How do I retrieve text from an embedded script using casperjs? [on hold]

the html element is

window.sawXmlIslandidClientStateXml="<nqw xmlns:saw=\\x22 xmlns:xsi=\x22\x22 xmlns:sawst=\\x22>\u003csawst:clientState>\u003csawst:stateRef>\u003csawst:envState xmlns:sawst=\"\" xmlns:xsi=\"\" xmlVersion=\"200811100\">\u003csawst:container cid=\"d:dashboard\" xsi:type=\"sawst:topLevelStateContainer\">\u003csawst:container cid=\"p:mco0pb0nob7sqjvg\" xsi:type=\"sawst:page\">\u003csawst:container cid=\"s:42263r43nih80fd1\" xsi:type=\"sawst:section\" rendered=\"true\">\u003csawst:container cid=\"g:c452lvndqssjqa45\" xsi:type=\"sawst:dashprompt\" links=\"-\" promptAutoCompleteState=\"off\"/>\u003c/sawst:container>\u003csawst:container cid=\"r:q4g2fiisnvk4nusv\" xsi:type=\"sawst:report\" links=\"fd\" defaultView=\"compoundView!1\" searchId=\"fvup02s9lt0o6urkplv4pqa5ri\" folder=\"/shared/Sales\" itemName=\"All Sales and Inventory Data\"/>\u003csawst:container cid=\"f:dpstate\" xsi:type=\"sawst:dashpromptstate\" statepoolId=\"ih2bj24l46bkgt558qsef04jeq\"/>\u003csawst:container cid=\"s:b0003tc6gnahvsfq\" xsi:type=\"sawst:section\" rendered=\"true\"/>\u003csawst:container cid=\"s:c5j314uterctfb08\" xsi:type=\"sawst:section\" rendered=\"true\"/>\u003c/sawst:container>\u003c/sawst:container>\u003c/sawst:envState>\u003c/sawst:stateRef>\u003csawst:reportXmlRefferedTo>\u003cref statePath=\"d:dashboard~p:mco0pb0nob7sqjvg~r:q4g2fiisnvk4nusv\" searchID=\"8oh8erup3kcqav10ukp36jaof2\">\u003c/ref>\u003c/sawst:reportXmlRefferedTo>\u003c/sawst:clientState></nqw>";

I want to retrieve the string ih2bj24l46bkgt558qsef04jeq under the identifier statepoolId from this script section. So how do I find this script in the HTML and get the string using casperjs?

by Kaa1el at August 23, 2016 06:53 PM



translating performance from EUR to USD [on hold]

can someone please let me know how to translate a performance return from USD to EUR. For instance, I have a time series (return) over 7 years from an US hedge fund and would like to translate it into EUR return. I am not interested in any form of hedging, I just need the return to be denominated in EUR

Thank you very much!

by srm at August 23, 2016 06:13 PM


AWS Week in Review – Coming Back With Your Help!

Back in 2012 I realized that something interesting happened in AWS-land just about every day. In contrast to the periodic bursts of activity that were the norm back in the days of shrink-wrapped software, the cloud became a place where steady, continuous development took place.

In order to share all of this activity with my readers and to better illustrate the pace of innovation, I published the first AWS Week in Review in the spring of 2012. The original post took all of about 5 minutes to assemble, post and format. I got some great feedback on it and I continued to produce a steady stream of new posts every week for over 4 years. Over the years I added more and more content generated within AWS and from the ever-growing community of fans, developers, and partners.

Unfortunately, finding, saving, and filtering links, and then generating these posts grew to take a substantial amount of time. I reluctantly stopped writing new posts early this year after spending about 4 hours on the post for the week of April 25th.

After receiving dozens of emails and tweets asking about the posts, I gave some thought to a new model that would be open and more scalable.

Going Open
The AWS Week in Review is now a GitHub project ( I am inviting contributors (AWS fans, users, bloggers, and partners) to contribute.

Every Monday morning I will review and accept pull requests for the previous week, aiming to publish the Week in Review by 10 AM PT. In order to keep the posts focused and highly valuable, I will approve pull requests only if they meet our guidelines for style and content.

At that time I will also create a file for the week to come, so that you can populate it as you discover new and relevant content.

Content & Style Guidelines
Here are the guidelines for making contributions:

  • Relevance -All contributions must be directly related to AWS.
  • Ownership – All contributions remain the property of the contributor.
  • Validity – All links must be to publicly available content (links to free, gated content are fine).
  • Timeliness – All contributions must refer to content that was created on the associated date.
  • Neutrality – This is not the place for editorializing. Just the facts / links.

I generally stay away from generic news about the cloud business, and I post benchmarks only with the approval of my colleagues.

And now a word or two about style:

  • Content from this blog is generally prefixed with “I wrote about POST_TITLE” or “We announced that TOPIC.”
  • Content from other AWS blogs is styled as “The BLOG_NAME wrote about POST_TITLE.”
  • Content from individuals is styled as “PERSON wrote about POST_TITLE.”
  • Content from partners and ISVs is styled as “The BLOG_NAME wrote about POST_TITLE.”

There’s room for some innovation and variation to keep things interesting, but keep it clean and concise. Please feel free to review some of my older posts to get a sense for what works.

Over time we might want to create a more compelling visual design for the posts. Your ideas (and contributions) are welcome.

Over the years I created the following sections:

  • Daily Summaries – content from this blog, other AWS blogs, and everywhere else.
  • New & Notable Open Source.
  • New SlideShare Presentations.
  • New YouTube Videos including APN Success Stories.
  • New AWS Marketplace products.
  • New Customer Success Stories.
  • Upcoming Events.
  • Help Wanted.

Some of this content comes to my attention via RSS feeds. I will post the OPML file that I use in the GitHub repo and you can use it as a starting point. The New & Notable Open Source section is derived from a GitHub search for aws. I scroll through the results and pick the 10 or 15 items that catch my eye. I also watch /r/aws and Hacker News for interesting and relevant links and discussions.

Over time, it is possible that groups or individuals may become the primary contributor for a section. That’s fine, and I would be thrilled to see this happen. I am also open to the addition to new sections, as long as they are highly relevant to AWS.

Earlier this year I tried to automate the process, but I did not like the results. You are welcome to give this a shot on your own. I do want to make sure that we continue to exercise human judgement in order to keep the posts as valuable as possible.

Let’s Do It
I am super excited about this project and I cannot wait to see those pull requests coming in. Please let me know (via a blog comment) if you have any suggestions or concerns.

I should note up front that I am very new to Git-based collaboration and that this is going to be a learning exercise for me. Do not hesitate to let me know if there’s a better way to do things!



by Jeff Barr at August 23, 2016 06:02 PM


De Maiziere hofft inständig, dass keiner von euch ...

De Maiziere hofft inständig, dass keiner von euch die Shadowbroker-Geschichte mitgekriegt hat. Denn wenn, nachdem der US-Geheimdienst, den man immer fragt, wenn die eigenen Leute zu inkompetent sind, wenn DENEN ihre Hintertür-Schlüssel geklaut werden, wie kann man sich dann vor ein Mikrofon stellen und so tun, als wären UNSERE Hintertüren schon sicher. Die wären nicht mal vor GCHQ und NSA sicher! Vermutlich nicht mal vor Jugend Forscht.

Abgesehen davon, dass ich nicht in einem Land leben möchte, in dem der Staat glaubt, er müsse mein Tagebuch entschlüsseln können.

August 23, 2016 06:00 PM


Matlab fitcsvm gives me a zero training error and 40% in testing

I know its over-fitting to the training data set, yet I dont know how to change the parameters to avoid this. I have tried changing the boxcontraint from 1e-5, 1e0, 1e1, 1e10 and got the same situation.

tTargets = ones(size(trainTargets,1),1);

svmModel = fitcsvm(trainData, ...
[Group, score] = predict(svmModel, trainData);
tTargets = ones(size(trainTargets,1),1);
svmTrainError = sum(tTargets ~= Group)/size(trainTargets,1);

[Group, score] = predict(svmModel, testData);
tTargets = ones(size(testTargets,1),1);
svmTestError = sum(tTargets ~= Group)/size(testTargets,1);

I hope someone can help with this Thanks,

by David Balderas at August 23, 2016 05:41 PM



Path in directed, weighted, cyclic graph with total distance closest to D?

Input: Directed, weighted, cyclic graph G. Two distinct vertices in that graph, A and B, where there exists a path from A to B. A distance d.

Output: A path between A and B with distance closest to d. The path need not be a simple path - it can contain repeated edges and vertices.

Which algorithms exist to solve this problem? I'm looking for an optimal solution, but I'm also interested to see if there are any approximation algorithms as well. Efficiency is not a huge concern - I just want to get an idea of the algorithms available.

Bonus question: are there any algorithms to compute a Hamiltonian path from A to B that visits every node, while finding the distance closest to d?

by Steven Schmatz at August 23, 2016 05:22 PM


counting patterns in image

I'm working on an algorithm that counts patterns (bars) in a specific image. It seemed to me very simple at the first look, but I realized the complexity quickly. I have tried simple thresholding, template matching (small sliding windows), edge detection... I have just few images like this one. so I think that a machine learning algorithm can't give better results! but I still need suggestions. enter image description here

by Abdelhak Mahmoudi at August 23, 2016 04:53 PM


Running time of naive recursive implementation of unbounded knapsack problem

How does one go about analyzing the running time of a naive recursive solution to the unbounded knapsack problem? Lots of sources say the naive implementation is "exponential" without giving more detail.

For reference, here's a bit of Python code that implements the brute-force solution. Note that this can run for a long time even on smallish inputs. One of the interesting things about Knapsack is some inputs are lot harder than others.

import random, time
import sys

class Item:
    def __init__(self, weight, value):
        self.weight = weight
        self.value = value

    def __repr__(self):
        return "Item(weight={}, value={})".format(self.weight, self.value)

def knapsack(capacity):
    if capacity==0: return ([], 0)
    max_value = 0
    max_contents = []
    for item in items:
        if item.weight <= capacity:
            (contents, value) = knapsack(capacity-item.weight)
            if value + item.value > max_value:
                max_value = value + item.value
                max_contents = [item]
    return (max_contents, max_value)

def generate_items(n, max_weight, vwratio=1):
    items = []
    weights = random.sample(range(1,max_weight+1),n)
    for weight in weights:
        variation = weight/10
        value = max(1, int(vwratio*weight + random.gauss(-variation, variation)))
        item = Item(weight, value)
    return items


items = generate_items(n=n, max_weight=max_item_weight, vwratio=1.1)

st = time.time()
solution, value = knapsack(capacity)
print("completed in %f"%(time.time() - st))

Note that this algorithm can be improved upon nicely by memoizing yielding a O(nW) pseudo-polynomial time solution, but I was interested in understanding how to analyze the brute-force algorithm more precisely.

by cbare at August 23, 2016 04:51 PM


Browser buttons not seen as buttons

A user fails to recognize native unstyled buttons in the browser. Something amazing has happened that every website has replaced buttons with custom styled widgets and users never see real buttons anymore. There was a time when users would recognize a button, even if it didn’t say “click me!”, because it looked like a button.

Original title: I can’t even.


by tedu at August 23, 2016 04:35 PM


Scala Higher Order Functions and Implicit Typing

I recently started working with Functional Programming in Scala and am learning Scala in the process. While attempting one of the Chapter 2 exercises to define a function that curries another function, I ran into this:

If I write

def curry[A,B,C](f: (A,B) => C): A => B => C =
    a: A => b: B => f(a, b)

then I get

Chapter2.scala:49: error: ';' expected but ':' found.
a: A => b: B => f(a, b)
one error found

BUT if I write

def curry[A,B,C](f: (A,B) => C): A => B => C =
    a => b => f(a, b)

then it compiles fine, with no warnings, and works. What's the difference?

by Ben Knoble at August 23, 2016 04:33 PM

Unbalanced labels - Better results in Confusion Matrix

I've unbalanced labels. That is, in binary classifier, I've more positives (1) data and less negatives (0) data. I'm using Stratified K Fold Cross Validation and getting true negatives as zero. Could you please let me know what options I have to get a value greater tan zero for true negatives? Thanks in advance!

by MLUser2016 at August 23, 2016 04:25 PM

set the number of folds within GridSearchCV (scikit-learn)

How to set the number of folds within GridSearchCV?

I'm using GridSearchCV in order to tune multiple hypterparameters. One of them is the number of folds, but I can't set it in the way I'm used to it:

grid = GridSearchCV(
    cv=StratifiedKFold(y, n_folds=2)   # <= change n_folds

params = {
    'cv': [5,7,10],   # this doesn't work

by Mike Dooley at August 23, 2016 04:21 PM

Redux store with Math formulas as functions

While writing an engineering application with a React Redux framework we have come across an issue of having a database of products that have functions to work out their load capacities and other properties. I know it is not a good idea to load the functions into the store and retrieving the functions from another location in the reducer breaks purity and makes the reducer much harder to test.

Is their any React Redux way of supplying the reducers with the database of product functions as a parameter, or similar, without putting them in the store and without breaking purity?


Each of the products have functions that might describe for example the relationship between jack extension and load capacity. This relationship is usually non-linear and has a graph that will relate the capacity over its extendable range. We have used curve fitting tools to match these graphs to functions over their range. I would like to be able to use these functions in a reducer such that when someone selects a product and extension we can obtain the capacity and check its suitability against other calculated loads in the state.

by Adam at August 23, 2016 04:13 PM

Mismatch between training error and results on training set examples after rbm re-training/tuning

I’m having some trouble tuning an RBM in a specific fashion. My goal is to retrain an RBM in such a way that a full generation of an example can be made from only half the input. In my case a full input vector would be two separate digits from the MNIST dataset.

The RBM I’m currently stuck with is the top layer of a system, with two parallel RBM’s below it, that is supposed to be a sort of association layer. The idea is to disconnect the bidirectional weights of the RBM into recognition(bottom-up) and generation(top-down) weights. The RBM implementation used is a modified example from the Theano deep learning tutorial:

Plotting examples directly after training, using a full input vector, produced good results, since the input is correctly generated after a full pass through the system.

The code I’m currently using is this: It currently calculates an error between a target activation(using the weights as they were after initial training), using the whole input vector, and a partial activation that is created after setting the second half of the input to 0(this uses the recognition weights that are being trained).

The gradient of this error function with respect to the recognition weights (only used in the partial activation generation) and the recognition weight are updates accordingly. The error decreases during this training/tuning step, sometimes as low as an error of 0.007 per batch of 25 after 50 epochs, but the results aren’t very good.

The first half of the input is represented decently well when an example is plotted, but the second half is not represented well at all. This conflicts with the low error rates shown during the training/tuning step and I have no idea why this is. Especially since the examples are selected from the training set. The dataset consists of 25k examples with length 1000 that are the results of the bottom two RBMs stitched together.

I’ve tried numerous update rules for the recognition weights(including direct update formulas), as well as different batch sizes and learning rates, but nothing seems to work.

by Tom van de Poll at August 23, 2016 04:11 PM

How to write testable code in Swift

So this question of mine all began when I started to do unit testing for a simple 2 line of postNotification and addObserver. From this similar question here you can see that to make it testable you need to add ~20 lines & part away from the common way of writing your code.

Facing this problem was actually the first time I understood the difference between Unit Testing and TDD. Unit testing is easy if your code is testable ie if you are following the TDD mindset. Next I was brought to how can I write testable code, which I didn't find much guidelines, every tutorial simply just jumps into writing a unit test. Apple own documentation has nothing for this.

My initial thought was that I need to aim for 'functional programming' and write my functions in a pure function way. But then again this is very time consuming and may require lots of refactoring in existing codes or even for new projects lots of added lines and I'm not even sure if that is the correct approach. Are there any suggested guidelines or standards for writing testable code?

I am not asking for how to write a testable code for NSNotificationCenter, I am asking for general guidelines for writing testable code.

by Honey at August 23, 2016 04:08 PM

Is my RNN underperforming?

After getting a hold of backprop and implementing my first simple feed-forward NN in numpy, I decided to try making it recurrent. I'm not sure if I'm doing this completely correctly, but here's what I've done: I've defined a set of weight matrices, one per hidden layer, to connect the hidden layers of the previous state to those of the current state. For forward propagation, I'm taking the dot product of the previous hidden layer states and their respective weight matrices (previous hidden to current hidden), and then I'm adding this new activation to the current state's activation for each hidden layer. Then I'm passing the new combined activation through a sigmoid function. For backprop, I'm calculating the deltas for these weights by taking the dot product of the current state's hidden layer errors (for each hidden layer) and the previous state's hidden layer activations, and then I'm subtracting the deltas from the (previous state to current state) weight matrices.

I tested it on simple examples like getting it to learn how to replicate the word "hello" while only being trained on one letter at a time (like the example from Karpathy's article on RNNs) and it works, but it requires large hidden layers and over a thousand training iterations to actually learn how to produce "hello" given the seed letter "h" (I got it working with a single hidden layer with 50 nodes). Not sure if it's relevant, but for the "hello" example I was using 28 inputs (1-of-k encoded vector for 26 letters plus " " and ".") instead of just 4 inputs like in Karpathy's example. (EDIT: just tried with only 4 inputs (and 10 hidden nodes) and it converges in around 5-20 epochs most of the time, but sometimes it never converges if it has multiple hidden layers. Much more reasonable, but not sure if this is still worse than expected.) Is this normal for a vanilla RNN, or have I done something wrong? It seems like it's underperforming, but I don't really have a reference point.

Thanks for the help!

by nagasgura at August 23, 2016 04:06 PM

What is the output of a machine learning algorithm?

I'm starting to study machine learning. I have a basic knowldege about it. If I consider a generic machine learning algorithm M, I would know which are its precise inputs and outputs. I'm not referring to some kind of implementation in a such programming language. I'm talking about the theory of machine learning. Take the example of supervised learning. The input of M should be the collection of pairs related to the function f the algorithm must learn. So, it will build some function h which approximate f. The output of M should be h? And what about unsupervised machine learning?

by foolcool at August 23, 2016 03:51 PM


Is NP-complete complexity defined in terms of polynomial reductions or polynomial transformations? [duplicate]

This question already has an answer here:

How do you know that a decision problem $X$ is NP-complete?, if all other NP-problems polynomially transform to $X$ or if all other NP-problems polynomially reduces (there exist a polynomial time oracle for any problem in NP using an oracle for $X$).

Definitions seem to differ all over the web. Thanks!

by Doc at August 23, 2016 03:48 PM


Multi-label feature selection using sklearn

I'm looking to perform feature selection with a multi-label dataset using sklearn. I want to get the final set of features across labels, which I will then use in another machine learning package. I was planning to use the method I saw here, which selects relevant features for each label separately.

from sklearn.svm import LinearSVC
from sklearn.feature_selection import chi2, SelectKBest
from sklearn.multiclass import OneVsRestClassifier
clf = Pipeline([('chi2', SelectKBest(chi2, k=1000)),
                ('svm', LinearSVC())])
multi_clf = OneVsRestClassifier(clf)

I then plan to extract the indices of the included features, per label, using this:

selected_features = []
for i in multi_clf.estimators_:
    selected_features += list(i.named_steps["chi2"].get_support(indices=True))

Now, my question is, how do I choose which selected features to include in my final model? I could use every unique feature (which would include features that were only relevant for one label), or I could do something to select features that were relevant for more labels.

My initial idea is to create a histogram of the number of labels a given feature was selected for, and to identify a threshold based on visual inspection. My concern is that this method is subjective. Is there a more principled way of performing feature selection for multilabel datasets using sklearn?

by user2589328 at August 23, 2016 03:39 PM


Calculating Implied Forward Rates from Eurodollar Futures Quotes

I'm trying to calculate the implied forward rates of the Eurodollar (USD) curve, knowing that the Eurodollar curve is supposed to be a mirror of the yield curve (else arb).

I have this formula for the value of the strip:

$Strip = \displaystyle \frac{\prod_{i= 1}^{n}\bigg(1 + R_i \cdot \big(\frac{days_i}{360} \big) \bigg) - 1}{\frac{term}{360}}$

Using this for current values of LIBOR, I have /GEZ6, /GEH7, /GEM7, /GEU7 to replicate a 1-year forward curve. The rates are $R_1 = 93.5bp$, $R_2 = 95bp$, $R_3 = 98bp$, $R_4 = 101bp$. Using this formula gives me the value of the strip at 97.2 basis points, which I'm confident is wrong.

How do I value the 1-year interest rate forward at December?

by Jared at August 23, 2016 03:37 PM



Estimating the rank of a large sparse matrix

Consider a large sparse n by n matrix. Are there any methods to estimate its rank in time roughly proportional the number of elements in the matrix?

by Lembik at August 23, 2016 03:24 PM



What is the analytic solution of the expected shortfall for a annual losses?

Assume we have annual losses $Z_i \sim Lognormal(0, 1)$, and $Z = \sum_{i=1}^N Z_i$, $N$ is fixed, so what is the closed form of the expected shortfall $ES_{0.99}$?

by Fly_back at August 23, 2016 03:19 PM


Average prefix code length of every 4-sized frequency vector is bounded at 2

I'm trying to show that for every frequency vector $(p_1, p_2, p_3, p_4)$ such that $\sum_{i=1}^4 p_i=1$, the average word length outputted by Huffman algorithm is bounded at 2: If $(w_1,w_2,w_3,w_4)$ is the outputted code, then $\sum_{i=1}^4 p_i |w_i| \le 2$.

I've tried looking at the tree that is generated by Huffman algorithm, but the thing is that several different tree structures match different 4-sized frequency vectors and I can't tell something general about all of them.

Also, is there a more general theorem for $k, n$ (here $k=4, n=2$)?

by Swanson at August 23, 2016 03:16 PM

Why do Tarjan's and Kosaraju's algorithms for finding strongly connected components have the same running-time complexity?

I followed an explanation of Kosaraju's and Tarjan's strongly-connected components algorithms, and they say that both have O(|V|+|E|) time complexity.

That didn't make sense to me since Kosaraju uses two DFS passes and computes the transposed graph, but Tarjan's use only one DFS.

by hasane has at August 23, 2016 03:12 PM


Help us name a new Mercurial feature

Jun Wu of Facebook is proposing a new hg feature which helps editing a commit stack. Basically, it looks through your recent draft commits (commits which haven’t been pushed to a public repo) your working directory, and automatically updates the right draft commits with the changes in your working directory that best correspond according to hg annotate/blame information. The intent, of course, is to make it easier and automatic to clean up a series of WIP commits.

What should this be called? Current proposals are stuff like hg amend --ancestors or hg histedit --smart. Jun Wu called it hg smartfixup. I don’t really like “smart” myself as I don’t find it very descriptive, and we already use “amend” essentially as a synonym for “fixup”.

We take naming things kind of seriously, as they are our UI. Like the Master said, “If language is not correct, then what is said is not what is meant; if what is said is not what is meant, then what must be done remains undone; if this remains undone, morals and art will deteriorate; if justice goes astray, the people will stand about in helpless confusion. Hence there must be no arbitrariness in what is said. This matters above everything.” ;-)

by JordiGH at August 23, 2016 03:01 PM


Oh wow, Nvidia hat krass Marktanteile verloren. Das ...

Oh wow, Nvidia hat krass Marktanteile verloren. Das hab ich nicht kommen sehen. Da muss ich meine Kritik an AMDs RX480-Strategie wohl zurücknehmen. Krass ist das ein Blutbad.

Das erklärt auch, wieso Nvidia die Preise so hochgedreht hat. Nicht nur weil sie es konnten, sondern weil sie dachten, dass sie es müssen.

August 23, 2016 03:00 PM



How get probability using libSVM (package e1071) in R?

I'm trying to get probability output in libSVM (packeage e1071 in R) but the output is only TRUE or FALSE with my dataset.

Follow the code:

dadosBrutos<-read.csv("Dataset/",header = FALSE)

svm.modelo <- svm(V3 ~ ., 
svm.predict <- predict(svm.modelo,
                       select = -V3),

posterior <- as.matrix(svm.predict)

But, when I use dataset Iris for example, the probability output is % and not the name of class.


model <- svm(Species ~ ., data = iris, probability=TRUE)
pred <- predict(model, iris, probability=TRUE)
head(attr(pred, "probabilities"))

#      setosa versicolor   virginica
# 1 0.9803339 0.01129740 0.008368729
# 2 0.9729193 0.01807053 0.009010195
# 3 0.9790435 0.01192820 0.009028276
# 4 0.9750030 0.01531171 0.009685342
# 5 0.9795183 0.01164689 0.008834838
# 6 0.9740730 0.01679643 0.009130620

Can someone help me to understand this?

Thank you Albert F. J. Costa

by Albert Josuá at August 23, 2016 02:55 PM

Using SMOTE on unbalanced dataset

I have a 2 class unbalanced dataset where the ratio is 20:1

I am using SMOTE to oversample the minor class and wanted to know when using SMOTE to develop a usable model, if it was best to oversample so that the percentage of the minor class was the same as the other class (i.e 1:1) or establish through trial an error the lowest possible ratio to improve the model overall to an acceptable level (i.e F1Score >0.7) but not use too many synthetic samples if that makes sense.

Any thoughts/advice appreciated.

by Byron Rogers at August 23, 2016 02:41 PM


Top Eight Must-Listen Developer Podcasts

This is a list of programming podcasts.

To the list I’ll add garbage by own @jcs


by fcbsd at August 23, 2016 02:32 PM


Calculating the net interest income and net interest margin

The tabel below contains financial statement information from the CBA 2013 annual report.enter image description here

I am asked to find the Net interest income and net interest margain:

My answers are as follows:

Net interest income = interest income-interest expense =35-21=14

Net income margin = (673-632)/673.

I'm not sure how to calculate net income margin, but would anyone be able to verify whether my answers are correct or not.

by user24093 at August 23, 2016 02:08 PM



Is it possible to build TensorFlow for GTX 1070?

I have an Ubuntu 14.04 LTS 64-bit with an Nvidia video card - GTX 1070 (10th generation). I'm trying to build TensorFlow.

I tried building it with CUDA 7.5 and CuDNN 5, but it turned out the CUDA 7.5 I installed requires the 352.63.0 video driver, while the video driver I downloaded from Nvidia for GTX 1070 was 367.35 - a newer version.

TensorFlow managed to build, but when I ran the example, there was a problem in runtime:

boyko@boyko-pc:~/Desktop/tensorflow/tensorflow/models/image/mnist$ LD_LIBRARY_PATH=/usr/local/cuda-7.5/targets/x86_64-linux/lib python3

It failed to find CUDA because of driver mismatch:

E tensorflow/stream_executor/cuda/] failed call to cuInit: CUDA_ERROR_NO_DEVICE
E tensorflow/stream_executor/cuda/] kernel version 367.35.0 does not match DSO version 352.63.0 -- cannot find working devices in this configuration

Full log -

CUDA 7.5 needs driver the 352.63 video driver, but GTX 1070 needs 367.35. The problem is TensorFlow officially supports only CUDA 7.5. So the requirements are a bit contradictory.

What do I need to do? Is it possible to use the 352.63 driver on a GTX 1070, will it run, even if it enables a limited feature-set? Or is there a CUDA 7.5 version built against this driver, or is there a way to build TensorFlow against CUDA 8.0?

This is a related question I found - Tensorflow Bazel 0.3.0 build CUDA 8.0 GTX 1070 fails.

by Boyko Perfanov at August 23, 2016 02:07 PM



Positive Cross-Autocorrelation

Can anyone explain to me what positive cross-autocorrelation is, Lo and MacKinlay (1990) refer to it. I am aware of positive autocorrelation or positive cross-correlation, but cant get my head around Positive Cross-Autocorrelation.

Thanks in advance

by Sam P at August 23, 2016 01:56 PM


Variable method reference in java 8

I am trying to create a method reference with the variable, which holds method name for some method from an object:

SomeClass obj = new SomeClass();
String methodName = "someMethod";

I am looking for way to create exactly obj::someMethod, but using variable methodName for this. Is it possible?

I know how to create functional interface instance from methodName and obj:

() -> {
    try {
        return obj.getClass().getMethod(methodName).invoke(obj);
    } catch (NoSuchMethodException | IllegalAccessException e) {
        return null;

but I am wondering is this can be done in more shorthand way.

by user3316027 at August 23, 2016 01:39 PM


Is “Binary Rectangle Tree” NP-hard?

It's 2D version of this problem:

The input is set of $n$ rectangles $R=\{R_1, \dots,R_n\}$, where each $R_i=I_1 \times I_2$ and $I_j$ are real intervals. The output should be the following rooted binary tree. Each leaf node corresponds to a rectangle from $R$. Each interior node contains a rectangle enclosing rectangles from both child nodes. The goal is to minimize the sum of surface areas of rectangles in interior nodes.

Example. Input $R=\{R_1, R_2, R_3\}=\{[0,1]\times[0,1],[1,2]\times[0,2],[20,25]\times[1,5]\}$

leaf nodes: $L_1=R_1,~L_2=R_2,~L_3=R_3$

interior nodes $I_1=(L_1,L_2),~I_2=(I_1,L_3)$


total sum of surface areas is $129$

The 1D version can be solved in polynomial time by dynamic programming. I can use dynamic programming but it needs exponential memory. I don't see any obvious NPC reduction to this problem.

Is the 2D version NP-hard?

by Esantin at August 23, 2016 01:18 PM


Why don't we transmit at rates higher than the Shannon capacity if we are going to get a nonzero probability of error anyways ?

Shannon capacity $C$ is the upper limit on a rate $R$ defined as the number of information symbols $k$ divided by the number of transmitted symbols $n$, that can be transmitted over a channel such that as $n \rightarrow \infty$, the probability of error goes to zero. If a rate $R > C $ is used, the probability of error is bounded away from zero.

Since $n$ is finite in practical applications, there is some nonzero probability of error.Thus, why would it matter if we transmit at a rate higher than $C$, Shannon's theorem predicts that such a rate will have a nonzero probability of error, but we already have a nonzero probability of error due to the fact that $n$ is finite.In other words, why don't we transmit at rates higher than $C$ if we are going to get a nonzero probability of error anyways ?

by user2692162 at August 23, 2016 01:07 PM



TCTL / UPPAAL : how to verifying a certain order of events?

I'd like to check if a certain order of events happens if another property holds true using UPPAAL and TCTL.

If A==true then eventually (B==true and eventually (C==true and eventually (D==true)))

But since "In contrast to TCTL, Uppaal does not allow nesting of path formulae." I'm not sure how to do this.

What I'v got so far is only something like:

E$\diamond$ (A && B)

E$\diamond$ (A && C)

E$\diamond$ (A && D)

by Jim McAdams at August 23, 2016 12:50 PM


Does ZFS for Linux over stress VirtualBox?

I've been using MD raid + LVM for many years, but recently decided to take a look at ZFS. In order to try it, I created a VirtualBox VM with a similar layout to my main server - 7 'SATA' drives or various sizes.

I set it up with an approximation of my current MD+LVM configuration and proceeded to work out the steps I needed to follow to rearrange files, LVs, VGs etc, to make space to try ZFS. All seemed ok - I moved and rearranged PVs until I had the space set up over a period of 3 days uptime.

Finally, I created the first ZPool:

  pool: tank
 state: ONLINE
  scan: none requested

    tank        ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sdb1    ONLINE       0     0     0
        sdc1    ONLINE       0     0     0
        sdd1    ONLINE       0     0     0
        sde1    ONLINE       0     0     0
        sdg1    ONLINE       0     0     0

errors: No known data errors

I created a couple of ZFS datasets and started copying files using both cp and tar. E.g. cd /data/video;tar cf - .|(cd /tank/video;tar xvf -).

I then noticed that I was getting SATA errors in the virtual machine, although the host system shows no errors.

Apr  6 10:24:56 model-zfs kernel: [291246.888769] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x6 frozen
Apr  6 10:24:56 model-zfs kernel: [291246.888801] ata4.00: failed command: WRITE FPDMA QUEUED
Apr  6 10:24:56 model-zfs kernel: [291246.888830] ata4.00: cmd 61/19:50:2b:a7:01/00:00:00:00:00/40 tag 10 ncq 12800 out
Apr  6 10:24:56 model-zfs kernel: [291246.888830]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  6 10:24:56 model-zfs kernel: [291246.888852] ata4.00: status: { DRDY }
Apr  6 10:24:56 model-zfs kernel: [291246.888883] ata4: hard resetting link
Apr  6 10:24:57 model-zfs kernel: [291247.248428] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  6 10:24:57 model-zfs kernel: [291247.249216] ata4.00: configured for UDMA/133
Apr  6 10:24:57 model-zfs kernel: [291247.249229] ata4.00: device reported invalid CHS sector 0
Apr  6 10:24:57 model-zfs kernel: [291247.249254] ata4: EH complete

This error occurs multiple times on various different drives, occasionally with a failed command of 'READ FPDMA QUEUED' or (twice) 'WRITE DMA', to the extent that the kernel eventually reports:

Apr  6 11:51:32 model-zfs kernel: [296442.857945] ata4.00: NCQ disabled due to excessive errors

This does not stop the errors being reported.

An internet search showed that this error had been logged on the web sites about 4 years ago ( for version 4.0.2 of VirtualBox and was apparently considered fixed, but then reopened.

I'm running VirtualBox 4.3.18_Debian r96516 on Debian (Sid) kernel version 3.16.0-4-amd64 (which is also the guest OS as well as host OS). ZFS is version 0.6.3 for

I would have thought more work had been done on this in the intervening years as I can't believe I'm the only person to try out ZFS under VirtualBox so would have thought this error would have been identified and resolved especially as versions of both ZFS and VirtualBox are maintained by Oracle.

Or is it simply the case that ZFS stresses the virtual machine to its limits and the simulated drive/controller just can't respond fast enough?


In the 14 hours since I created a pool, the VM has reported 204 kernel ata erors. Most of the failed commands are 'WRITE FPDMA QUEUED', followed by 'READ FPDMA QUEUED', 'WRITE DMA' and a single 'FLUSH CACHE'. Presumably, ZFS retried the commands, but so far I am wary of using ZFS on a real server if it's producing so many errors on a virtual machine!

by StarNamer at August 23, 2016 12:49 PM



Original reference for Huffman shaped Merge Sort?

What is the first publication of the concept of optimizing merge sort by

  1. identifying sequences of consecutive positions in increasing orders (aka runs) in linear time; then
  2. repeatedly merging the two shortest such sequences and adding the result of this merging to the list of sorted fragments.

In some of my publications (e.g., I used this trick to sort faster and to generate a compressed data structure for permutation.

It seems that this trick was introduced before, just in the context of sorting faster, but neither me nor my student have been able to find back the reference?

by Jeremy at August 23, 2016 12:03 PM

Fred Wilson

Understanding VCs

I saw Joe Fernandez‘ tweet a few days ago and thought “he is making an important point.”

VCs are not heroes. We are just one part of the startup ecosystem. We provide the capital allocation function and are rewarded when we do it well and eventually go out of business when we don’t do it well. I know. I’ve gone out of business for not doing it well.

If there are heroes in the startup ecosystem, they are the entrepreneurs who take the biggest risks and create the products, services, and companies that we increasingly rely on as tech seeps into everything.

VCs do have a courtside seat to the startup world by virtue of meeting and getting pitched by hundreds of founding teams a year and sitting in board meetings for many of these groundbreaking tech companies. We get to see things that most people don’t see and the result of that is that we often have insights that come from this unique view we are given of the startup sector.

Another thing that is important to know about VCs is that we operate in a highly competitive sector where usually only one or two VC firms are allowed to make a hotly contested investment. So in order to succeed, VCs need to market ourselves to entrepreneurs. There are many ways to do that and the best way is to back the most successful companies and be known for doing that. There is a reason that Mike Moritz and John Doerr were invited to lead Google’s initial VC round. By the time that happened, they had established themselves as the top VCs in the bay area and their firms, Sequoia and Kleiner Perkins, had established themselves as the top firms in the bay area.

Another way that VCs market ourselves to entrepreneurs is via social media. And blogging is one of the main forms of social media that VCs can use to do this. And, given that VCs have this unique position to gather insights from the startup sector, we can share these insights that we gain from our daily work with the world, and in particular entrepreneurs. If anyone has played this blogging game well enough to get into the top tier, it is me. I know of what I speak.

So how should entrepreneurs use this knowledge that is being imparted by VCs on a regular basis? Well first and foremost, you should see it as content marketing. That is what it is. That doesn’t mean it isn’t useful or insightful. It may well be. But you should understand the business model supporting all of this free content. It is being generated to get you to come visit that VC and offer them to participate in your Seed or Series A round. That blog post that Joe claimed is not scripture in his tweet is actually an advertisement. Kind of the opposite of scripture, right?

But you should also know that there is data behind that blog post, gained from hundreds (or thousands) of pitches and dozens (or hundreds) of board meetings. If VCs are good at anything, we are good at pattern recognition and inferring what these patterns are leading to. And so these blog posts that are not scripture, and are in fact advertising, can also contain information and sometimes even wisdom. So they should not be ignored either.

What I recommend to entrepreneurs is to listen carefully but not act too quickly. Get multiple points of view on important issues and decisions. And then carefully consider what to do with all of that information, filter it with your values, your vision, and your gut instinct. That’s what we do in our business and that is what entrepreneurs should do in their businesses.

If you are at a board meeting and a VC says “you should do less email marketing and more content marketing”, would you go see your VP Marketing after the meeting and tell them to cut email and double down on content? I sure hope not. I hope you would treat that VC comment as a single data point, to be heard, but most likely not acted on unless you get a lot of similar feedback.

VCs are mostly not idiots and can be quite helpful. But we are not gods and our word is not scripture. If you treat us like that, you are making a huge mistake. And I appreciate Joe making that point last week and am happy to amplify it with this post.

by Fred Wilson at August 23, 2016 12:01 PM


Conference Room A/V Build-Out

We recently moved to our new building at 1034 Wealthy. We took the opportunity to update the A/V equipment for our conference rooms. Previously, we largely relied on projectors for presentation capabilities, an external USB microphone/speaker for audio, built-in webcams on laptops for video, and a table where we staged everything. This worked, but it was certainly not ideal. With the new building, I had the opportunity to standardize a new conference room A/V build-out that would be better suited to our needs.

All of our new conference rooms now have a mobile TV stand which holds all of our A/V equipment. This includes a large flatscreen TV, dedicated webcam, dedicated microphone/speaker, and all necessary cables and connectors. Our new setup provides important capabilities required for many of our meetings, especially teleconferences: mobility, audio input, audio output, video input, and video output.



I choose the Kanto Living MTM82PL mobile TV mount, which includes the mounting hardware for a flatscreen TV, a small shelf, and a shelf for a webcam above the TV. It is a sleek, yet sturdy platform which allows our A/V build-out to be mobile. While largely dedicated to conference rooms, it can also be moved out to other areas–such as our cafe–for events or meet-ups.

Video Output

The Samsung 65″ Class KU6300 6-Series 4K UHD TV was selected as our primary display. This provides a much better picture and much higher resolution than the old projectors we were using. It has a native resolution of 3840 x 2160, a 64.5″ screen (diagonal), and 3 HDMI ports. While not all of our devices can support that resolution at this point (for example, AppleTVs only support up to 1080p), it still seemed like a worthwhile investment to help future-proof the solution.

Video Input

I chose the Logitech HD Pro Webcam C920 for video capabilities. It supports 1080p video when used with Skype for Windows, and 720p video when used with most other clients. The primary benefit of this webcam is that it can be mounted above the TV on the mobile stand, providing a wide view of the entire room–rather than just the person directly in front of the built-in laptop webcam.

Audio Input/Output

We had previously made use of the Phoenix Audio Duet PCS as a conference room “telephone” for web meetings–it provides better audio capabilities for a group of people than a stand-alone laptop. We placed one of these in each of the conference rooms as part of the A/V build-out. It acts as the microphone and speaker, while using the Logitech webcam for video input and the Samsung TV for video output.


Of course, I needed a few other items to tie all of these different capabilities together.


I purchased 20 ft. Luxe Series High-Speed HDMI cables so people can connect directly to the Samsung TVs for presentations. This type of connection allows computers to utilize the full resolution of the new TVs.


The Moshi Mini DisplayPort to HDMI adapter provides connectivity for those Atoms whose MacBooks do not natively support HDMI.

Presentation Helpers

I decided to purchase Apple TVs to allow for wireless presentation capabilities. With AirPlay, Macs (and other compatible devices) can transmit wirelessly to the TV–without the need for an HDMI cable. This is convenient for getting up and running quickly without any cable clutter, but it isn’t always appropriate (which is why a direct HDMI connection is available as well).

Cable Management

In addition to the standard cable ties and other cable management tricks, I’ve found that Cozy Industries, makers of the popular MagCozy, also makes a DisplayCozy. This helps keep the Moshi HDMI adapter with the HDMI cable.

Power Distribution

While the mobile TV cart provides a great deal of flexibility, the new building also has wide spaces between electrical outlets. To ensure that the A/V build-out would be usable in most spaces, I decided to add a surge protector with an extra-long cord. The Kensington Guardian 15′ works well for this.

Finished Product

Atomic A/V CartAtomic Mobile A/V Solution

The post Conference Room A/V Build-Out appeared first on Atomic Spin.

by Justin Kulesza at August 23, 2016 12:00 PM


Shifted SABR for negative strikes

I am trying to apply SABR on EUR inflation caplets, with positive forward and negative strikes. Classical BS pricing is undefined, and so is SABR. I have read about the shifted SABR, which is supposed to accept negative strikes, but I was wondering whether anyone is aware of an existing implementation on Matlab for instance.

I have fitted the standard SABR parameters on positive strikes and modified some existing SABR code, adding the shift to the strike and the forward rate in the volatility equation. Now, I am feeding this modified volatility equation with my negative strikes and forward + the fitted SABR parameters, but nothing seems to have changed: it is still impossible to compute vols for negative strikes.

Do I have to feed the original strikes to the shifted model or the shifted strikes ? Do I have to shift the forward as well ? Is anyone aware of a better procedure for using the shifted SABR with negative rates?

Thanks a lot !

by KP6 at August 23, 2016 11:55 AM


Easy print / auto push buttton / programming

please can you help me with printing in windows via right mouse button I try it but program (labelStar) show me a print dialogue . I try to edit .exe in resource hacker and PE explorer but i dont know how can I edit the file, I would like to make automatic confirmation of this dialogue or edit registry parameters for automatic confirm? enter image description here enter image description here here is a exe file:

thank you

by Fiínek Cahů at August 23, 2016 11:47 AM



(Query, Document, Relevance) free dataset for building an information retrieval system

I'm interested on finding a data set like "English Relevance Judgements File List":

This dataset contains a labelled, pairs of queries and documents. However, it depends on a nonfree corpus, called "Data - English Documents":

Do you know any free dataset(s) similar this one?

Side-note: The dataset will be used in a research project for building an information retrieval system based on neural networks.

by Ahmed at August 23, 2016 11:40 AM

Planet Emacsen

Irreal: Scimax

The ACM Technews newsletter has a short piece on John Kitchin's Scimax project. Here's the CMU article on Scimax, which gives an overview of the project.

Basically, Scimax is the collection of (mostly) Elisp utilities that Kitchin has put together to help with his group's writing and publishing of papers. It features using Org mode to write the papers in a reproducible research way and then publish them to the format required by the journal they are submitting the paper to. There are also some tools to aid in teaching. For more details, check out Kitchin's Scimax page.

The nice thing about Scimax is that all the utilities are packaged up into a single project repository that anyone interested can download and use. The project is hosted on Github if you're interested.

by jcs at August 23, 2016 11:15 AM


Why are scala Vals not lazy by default

I have noticed that I almost exclusively use lazy val assignments as they often avoid unnecessary computations, and I can't see that many situations where one would not want to do so (dependency on mutable variables being a notable exceptions of course).

It would seem to me that this is one of the great advantages of functional programming and one should encourage its use whenever possible and, if I understood correctly, Haskell does exactly this by default.

So why are Scala values not lazy by default? Is it solely to avoid issues relating to mutable variables?

by TimY at August 23, 2016 11:10 AM

Error in graphlab.SFrame('')

I am doing Machine Learning Course from Coursera by University of Washington. In which I am using iPython's graphlab. During practise when I execute below command:

sales = graphlab.SFrame('')

I am getting the error:

InvalidProductKey                         Traceback (most recent call last)
<ipython-input-3-c5971b60b216> in <module>()
----> 1 sales=graphlab.SFrame('')

/opt/conda/lib/python2.7/site-packages/graphlab/data_structures/sframe.pyc in __init__(self, data, format, _proxy)
865             self.__proxy__ = _proxy
866         else:
--> 867             self.__proxy__ = UnitySFrameProxy(glconnect.get_client())
868             _format = None
869             if (format == 'auto'):

/opt/conda/lib/python2.7/site-packages/graphlab/connect/main.pyc in get_client()
138     """
139     if not is_connected():
--> 140         launch()
141     assert is_connected(), ENGINE_START_ERROR_MESSAGE
142     return __CLIENT__

/opt/conda/lib/python2.7/site-packages/graphlab/connect/main.pyc in launch(server_addr, server_bin, server_log, auth_token, server_public_key)
 90         if server:
 91             server.try_stop()
 ---> 92         raise e
 93     server.set_log_progress(True)
 94     # start the client

 InvalidProductKey: Product key not found.

(Note the ipython notebook and are in same folder.)

by Lok at August 23, 2016 10:29 AM

Python implementation of voted perceptron

can anyone know how to implement the voted perceptron algorithm (reported in this article in python ? Which could be a dual representation of the voted perceptron?


by Nick.b at August 23, 2016 10:24 AM


Planet Emacsen

Pragmatic Emacs: Search or swipe for the current word

It is often handy to search for the word at the current cursor position. By default, you can do this by starting a normal isearch with C-s and then hitting C-w to search for the current word. Keep hitting C-w to add subsequent words to the search.

If, like me, you use swiper for your searches, you can obtain the same effect using M-j after you start swiper.

This is all very nice, but both of those solutions above search for the string from the cursor position to the end of the word, so if “|” marks the cursor position in the word prag|matic, then either method above would search for matic. I made a small tweak to the relevant function in the ivy library that powers swiper so that the whole of the word is used, so in the example above M-j would search for the full pragmatic string.

Here is the code:

;; version of ivy-yank-word to yank from start of word
(defun bjm/ivy-yank-whole-word ()
  "Pull next word from buffer into search string."
  (let (amend)
      ;;move to last word boundary
      (re-search-backward "\\b")
      (let ((pt (point))
            (le (line-end-position)))
        (forward-word 1)
        (if (> (point) le)
            (goto-char pt)
          (setq amend (buffer-substring-no-properties pt (point))))))
    (when amend
      (insert (replace-regexp-in-string "  +" " " amend)))))

;; bind it to M-j
(define-key ivy-minibuffer-map (kbd "M-j") 'bjm/ivy-yank-whole-word)

by Ben Maughan at August 23, 2016 09:45 AM


What are alternatives of Gradient Descent?

Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima.

Can anybody tell me about any alternatives of gradient descent with their pros and cons.


by Nusrat at August 23, 2016 09:13 AM


Benutzt ihr alle Adblocker? Solltet ihr. Der neueste ...

Benutzt ihr alle Adblocker? Solltet ihr. Der neueste Trend aus den USA ist, dass Webseiten ihre Besucher per E-Mail zuspammen. Nein, nicht Webseiten, auf denen man einen Account eingerichtet und denen man seine E-Mail-Adresse gesagt hat. Werbenetzwerke können ja Benutzer seitenübergreifend zuordnen. Das ist genau deren Funktion. Und wenn man jetzt irgendwo innerhalb des Netzwerks seine E-Mail-Adresse hinterlegt hat, dann kann man von jedem anderen Teilnehmer mit Spam bespritzt werden.

August 23, 2016 09:00 AM

So, einige Einsendungen zu meinem Aufruf gestern sind ...

So, einige Einsendungen zu meinem Aufruf gestern sind reingekommen. Hier die erste Fuhre.
Vorweg, ich habe nichts für die Stiftung zu sagen, ich habe aber eine Aluhut-Theorie für dich:

Vielleicht ist die Skandal um die AA-Stiftung ja ein Gegenfeuer dazu, dass Bertelsmann jetzt die öffentliche Meinung bei facebook zensiert. Das ist ja eigentlich ein viel größerer Klops, es scheint aber angesichts der AA-Debatte nahezu unterzugehen.

Guter Hinweis! Die Presse berichtete, dass Facebook ihre Zensur an Arvato outgesourced hat, aber nicht alle werden wissen, dass Arvato Bertelsmann gehört.

Ein anderer Einsender verwies auf diesen Blogeintrag bei Mobilegeeks, den ich allerdings nicht ernst nehmen kann, weil er direkt hiermit eröffnet:

ausgerechnet denen den Willen zur Einschränkung der Meinungsfreiheit unterstellt, die versuchen, es wieder jedem möglich zu machen angstfrei die eigene Meinung zu sagen
Sorry, das geht mir dann doch zu weit. Nicht mal die AA-Stiftung selbst behauptet von sich, für die freie Meinungsäußerung ohne Angst einzutreten. Der Autor scheint mir auf dem Weg irgendwo falsch abgebogen zu sein.

Ein dritter Einsender verweist auf dieses EU-Papier für mehr Toleranz, das, aus dem entsprechenden Blickwinkel gelesen, wie die Einführung einer Indoktrinationsmaschinerie klingt (besonders Sektion 8 und 9).

Sanfte Grüße Sir, was sind Ihre Extreme?

Noch jemand wies auf die Selbstverteidigung der AA-Stiftung hin, die auf mich ja eher peinlich wirkt, ehrlich gesagt. Aber vielleicht seht ihr das ja anders.

Der einzige aus meiner Sicht ernstzunehmende Fürsprecher hat leider einen Haufen persönlicher Informationen in seinem Beitrag, die ich mal vorsichtig rauszufiltern versuchen werde, solange ich nicht sicher weiß, dass ich das veröffentlichen darf.

Ich habe mich jahrelang gegen das braune Haus in unserem Dorf [Name zensiert] engagiert. Wir haben dazu eine Bürgerinitiative gegründet und uns persönlich den Nazis entgegen gestellt. Ja, das ist das Haus, in dem Ralf Wohlleben, Rene Kapke und Co. gelebt und gewirkt haben. Ja, ich habe die Leute persönlich kennen gelernt und mit ihnen auch öffentlich und privat gestritten. Ja, das ist der Stadtteil, in dem die drei NSU-Leute aufgewachsen sind. Ja, das ist der Stadtteil, in dem die Garage steht, in der die Rohrbomben gefunden wurden, mit deren Auffinden das Untertauchen des NSU begann. Ja, das ist das Haus, in lt. Anklage die Waffenlieferung und Unterstützungen für den NSU stattgefunden habe. Inzwischen ist das Haus abgerissen. Unsere Aufgabe ist erledigt.

Aber zurück: Unsere Initiative hat von der AA-Stiftung Unterstützung bekommen. Wir hatten finanzielle Unterstützung in Form eines Preisgeldes (nicht von der Stiftung). Als wir eine Plakataktion machen wollten, bei der Schulen zur thematischen Auseinandersetzung mit dem Thema Rechtsextremismus aufgefordert wurden, hat uns die AA-Stiftung kräftig unterstützt. Sie haben dafür gesorgt, dass wir - ohne an unser Geld zu gehen und ohne den formalen Kram bei uns abzuladen - die besten Plakate an den großen Werbetafeln der Einfallsstraßen platziert wurden.

Kurz gesagt habe ich die AA-Stiftung nur als eine zurückhaltende und unterstützende Organisation lokaler Initiativen wahrgenommen. Allerdings ist das alles schon ein paar Jahre her. Zu den "Pranger"-Aktionen kann ich nichts sagen. Das ist in dem mir bekannten Netzwerken nicht angekommen und war dort nie Thema.

Ich würde Dich also bitten, zwischen den verschiedenen Aktionen der Stiftung zu unterscheiden und nicht aus einer Katastrophe auf alle anderen Bereiche zu schließen.

Update: Zu dem Mobilegeeks-Ding findet ein Kumpel, dass ich den zu vorschnell abkanzele. Seine Interpretation sei, dass die Stiftung niemanden zensieren, sondern nur die Auswüchse bekämpfen will, die es den Leuten unmöglich machen, ihre Meinung angstfrei zu äußern. Lest das also selber.

Update: Leserbrief:

nicht sicher, ob das etwas ist wonach du suchst, weil es nicht erklärt/verteidigt, was die AA sonst so macht aber sie führen eine Chronik flüchtlingsfeindlicher Aktionen, die in meinen Augen ein sehr wichtiger Dienst ist (zumindest habe ich sonst nirgends eine so vollständige und mit Quellen versehene Liste dieser Art gefunden)
Aber vermutlich ist das nicht wonach du suchst...
Doch, das ist genau, wonach ich suche.

Update: Und noch ein toller Leserbrief:

die Amadeu-Antonio-Stiftung greift da gerade ins Klo, aber wir brauchen sie aus diesen anderen Gründen noch.
Sie ist definitiv das kleinere Übel gegenüber den Nazis!
(Bisher ja nicht so der Kracher, aber bleibt mal dran)
Es mag vom Tegernsee oder München, von Charlottenburg oder dem Prenzlauer Berg aus schwer vorstellbar sein, aber z.B. hier im östlichen MV / nordöstlichen Brandenburg wie in weiten Teilen der (Ost-) Provinz gibt es eine jahrzehntelang etablierte dumpf-braune Subkultur, mittlerweile wächst die 3. Generation in diesem Milieu heran. Bis zur Jahrtausendwende gab es hier außer ein paar Pastoren einfach niemanden, der jungen Leuten, die keine Lust auf Landsermucke und Punker-Klatschen hatten, irgendwelche Angebote gemacht hätte. Von Staats wegen herrschte die Doktrin der "akzeptierenden Jugendarbeit" vor, d.h., die Jungglatzen gingen zum Abhitlern ins kommunal geförderte Jugendzentrum.

Für Leute von 12 - 18 Jahren, die mit Kirche nichts am Hut haben und nicht mit dem rassistischen Strom schwimmen wollen, bleibt eigentlich nur die Flucht.
Das Verdienst der AAS ist es, in solchen Verhältnissen für Ermutigung, Stärkung und regionale Vernetzung solcher Menschen zu sorgen. Mit einer Vielzahl kleiner Ansätze im sozio-kulturellen Bereich bleiben sie in der Fläche aktiv bzw. ermöglichen Aktivitäten wie multikulturelle Feste, interkulturelle Events und Kurse an Berufsschulzentren, wo die Kids vielleicht das erste Mal in ihrem Leben mit einem Kameruner oder Libanesen wirklich sprechen können oder einfach gemeinsam Musik machen, kochen und ähnliches.

Die ASS unterstützt mobile Beratungsteams, die in den Käffern, wo die Glatzen mal wieder einen nicht-rechten Klub verwüstet haben oder vorm Flüchtlingsheim randalieren, den ganz normalen Leuten den Rücken stärken und sie ermuntern, dem Mob etwas entgegenzusetzen.
Nicht zuletzt die lokalen Unterstützer-Initiativen für Flüchtlinge, die in dieses feindliche Klima geworfen sind, können sich bei der AAS Know-how, technische Hilfe und alle möglichen Arten der Unterstützung holen.
Sie ist es auch, die bei den Ministerien und untergeordneten Behörden immer wieder auf der Matte steht und Aktivität und Mittel gegen den Mainstream aus Ressentiment und Ignoranz einfordert.

Ich selbst bin Handwerker und trage privat und im Arbeitsumfeld mein Schärflein zu einer offeneren, toleranteren Gesellschaft bei, so gut ich kann.

Ursprünglich in Berlin zu Hause, beobachte und kenne ich den Alltag hier draußen sei knapp 15 Jahren.

Ich bin also nicht beruflich im sozialen Sektor unterwegs und betrachte die Entwicklung eher als aktiver Bürger (im Sinne von citoyen).
Mein Eindruck ist, dass es eine Menge Leute hier nicht mehr gäbe ohne die stetige, kleinteilige lokale Arbeit, die die AAS tut.
Gerade seit 2014/15 mit dem großen Zustrom von Migranten auf das platte Land kann man sehen, wie notwendig die Präsenz einer kleinen aber aktiven Fraktion weltoffener und nicht fremdenfeindlicher Menschen ist.
Das Gefühl tut gut, mit den Dumpfbacken nicht allein gelassen zu sein, und dieses Gefühl verdanke ich zu einem guten Teil Initiativen und Aktionen aus dem ASS-Portfolio, denn gerade im politischen Raum gibt es sonst kaum jemanden, der hier so lange bei der Stange geblieben wäre.

Zu dieser ganzen Nohatespeech-Sache kann ich wenig sagen, da ich weder Twitter- noch Facebook-Nutzer bin.
Möglicherweise hat sich die Stiftung, was ihre Netz-Aktivitäten betrifft, einfach die falschen Leute eingetreten.
Abseits von Don Alphonsos Privatfeldzug gegen gewisse Berliner Ex-Piratinnen muss ich allerdings Deinem Leser beipflichten, dass die breitflächige Bertelsmann-Zensur der sog. Sozialen Medien mir doch der größere Skandal zu sein scheint im Vergleich zu den paar 1000 €, die das Familienministerium in ein schlechtes Neue-Rechte-Wiki versenkt hat.

Wenn man mal schaut, wer sich da jetzt alles in Empörung übt und mit welchen Argumenten ("Stasi! Stasi! STAAASI!!!"), kann man sich eigentlich nur wundern, welche und wie viele ASS-Experten und -watchdogs es in diesem Land gibt.

Wunderbar, vielen Dank an die Leserbriefschreiber. Sowas hatte ich mir gewünscht. Audiatur et altera pars.

August 23, 2016 09:00 AM

Deal des Tages:Nun hat Bundeskanzlerin Merkel in einem ...

Deal des Tages:
Nun hat Bundeskanzlerin Merkel in einem Zeitungsinterview von allen türkischstämmigen Bürgern Loyalität eingefordert. Im Gegenzug will sie "versuchen, ein offenes Ohr zu haben".
Das Schlüsselwort ist "versuchen", ja? So sieht die Merkel ihren Job!

August 23, 2016 09:00 AM

Wo geht bei Tor die Reise hin, jetzt, wo die ganzen ...

Wo geht bei Tor die Reise hin, jetzt, wo die ganzen Freiwilligen sich distanzieren?

Hier kann man das ganz gut sehen.

Modell Klingelbeutel mit einer Packung FUD ala "wenn das Freiwillige machen, wird das eh nichts und birgt Risiken". Daher lieber als Kommerz-Service mit Industriepartnern aus den Five Eyes!

August 23, 2016 09:00 AM


Why do GF(2) and CRC not give the same result?

I am trying to understand (and implement functions for) binary polynomial division.

My first step was to understand and compare the results of two online tools. The first is a formal GF(2) polynomial calculator. The second is a CRC polynomial calculator.

I expected the remainder of the formal calculator being equal to the checksum of CRC calculator.

So I entered the following data to the formal calculator:

A = 0100000101000001 (should be same as "AA" ASCII data)
B = 11111

And I entered the following to the CRC calculator:

CRC order = 4
CRC polynom = F
Data sequence = AA
Initial = 0
Direct, no reverse input, no reverse output

I used width 4 and polynomial F (instead of 5 and 1F) since (as far as I understand) the CRC calculator expects polynomials in standard notation (that omits the leading 1-bit).

The CRC calulcator says checksum is 2 while formal calculator says binary remainder is 100 = 4.

Why don't I get same results?

by Silicomancer at August 23, 2016 09:00 AM

Is this a correct argument for the O(n log n) bound on sorting algorithms? [on hold]

Let $A$ be the array to be sorted, and assume $A$ has size $n$. Now there are $n!$ ways to permute the elements of $A$, but any sorting algorithm must determine the "correct" permutation. This means that the algorithm must work at least $\lg (n!)$ time. Now $\lg (n!) = \sum_{i=1}^n \lg i \sim \int_1^n \log x \, dx \sim n \log n $ QED

by Anonymous at August 23, 2016 08:56 AM


Application of "Cocktail party effect" in phones

I recently learned about "Cocktail party problem":

The cocktail party effect is the ability to focus on a specific human voice while filtering out other voices or background noise." [1]

Is this problem "solved" in phones ? It doesn't look it is, because when I call with someone, who is on e.g. party or at some noisy place I still cannot hear his/her voice very clearly. If the solution isn't applied in phones, my question is why ? Is it because the microphones should be good enough and it will bring only a very small improvement ?

by FilipR at August 23, 2016 08:55 AM


What is this approximation/error-reduction method called?

I'm wondering if anyone could help me find my footing in an approach I am taking with a student in my audio programming class for creating more accurate pitch detection algorithms. But the approach is not limited to pitch detection and in fact seems seems similar to Newton's Method, Euler's Method, Horner's Method, and so on. It is a very simple and general idea, and must have some background in numerical methods. I am looking for pointers to the literature.

Here is the idea. We have a function f which takes a signal and returns the fundamental frequency (such algorithms are close cousins to the Discrete Fourier Transform). In order to test its accuracy, I created simple sine wave signals of precise frequencies and tested the algorithm, and graphed the errors over a particular range; basically a perfect f would be the identity function, so we just had to record the deviation from the identity. The errors are basically sinusoidal. So I stored the errors in an array, and use cubic interpolation to create a continuous error function, and built that into the last stage of the algorithm. Of course, there is a problem, because the errors showed the deviation from a perfect f, and the original f is not perfect, so there would be errors in the errors, so to speak. So I iterated the process, correcting successively for errors in the errors, and the algorithm gets better each time. I have not yet figured out whether it will converge to some minimal error. I also have not tested it in musical settings. But it is very promising, and seems like a generally useful technique.

Separate from a programming trick, I would like to understand some of its properties such as convergence and so on. Anyone have any pointer, keywords, etc. for me to pursue this? I'm guessing it is a standard technique in numerical methods.

by Wayne Snyder at August 23, 2016 08:50 AM


Does the Knock-out option price go to $0$ when the stock price goes to the barrier $B$?

I am reading Steven Shreve's book "Stochastic Calculus for Finance 2 Continuous-Time Models", page 304. My intuition is that when the stock price gets closer to the barrier, it will be more and more likely that the price will exceed the barrier in a near future, hence it has a large probability to become worthless. This leads to the consequence that the price of the option should be closer and closer to zero. But I can not justify this intuition from the formula on page 304. Can someone explain this? Thanks a lot.

The formula is $$V(0)=S(0)I_1-KI_2-S(0)I_3+KI_4$$ where $$\quad I_1=\frac{1}{\sqrt{2\pi T}}\displaystyle\int_{k}^be^{\sigma w-rT+\alpha w-\frac{1}{2}\alpha^2T-\frac{1}{2T}w^2}dw$$

$$I_2=\frac{1}{\sqrt{2\pi T}}\displaystyle\int_{k}^be^{-rT+\alpha w-\frac{1}{2}\alpha^2T-\frac{1}{2T}w^2}dw$$ and $$\quad I_3=\frac{1}{\sqrt{2\pi T}}\displaystyle\int_{k}^be^{\sigma w-rT+\alpha w-\frac{1}{2}\alpha^2T-\frac{2}{T}b^2+\frac{2}{T}bw-\frac{1}{2T}w^2}dw$$

$$I_4=\frac{1}{\sqrt{2\pi T}}\displaystyle\int_{k}^be^{-rT+\alpha w-\frac{1}{2}\alpha^2T-\frac{2}{T}b^2+\frac{2}{T}bw-\frac{1}{2T}w^2}dw$$

by Resorter at August 23, 2016 08:50 AM


How to calculate sum of binomial coefficients efficiently?

I want to compute the sum

$$\binom{n}{0}+\binom{n}{2}+\binom{n}{4}+\binom{n}{6}+\dots+\binom{n}{k} \bmod 10^9+7$$

where $n$ and $k$ can be up to $10^{14}$ and $k\le n$.

I found several links on stack overflow to calculate sum of binomial coefficients but none of them works on large constraints like $10^{14}$. I tried doing it by changing series using the relation $\binom{n}{k}=\binom{n-1}{k-1}+\binom{n-1}{k}$ and came up with a brute force solution which is of no use. Is there any way to do it efficiently?

This question is from the TCS codevita 2016 round 2 contest, which has ended.

by srd091 at August 23, 2016 08:46 AM


Is a bondfuture an IRD or a Credit Derivative?

I need to categorize a BondFuture trade in one of the five major asset classes and I am not sure if it should put it to the interest rate asset class or the credit asset class.

A quick (and dirty) thought it to split the bond trade to an IR swap and a CDS.

For example, buying a fixed rate bond could be 'linked' with going short on an IR Swap and short a CDS on the issuer.

Any other ideas?


by sen_saven at August 23, 2016 08:45 AM

How to calculate a forward-starting swap with forward equations?

I have been trying to resolve this problem for some time but I cannot get the correct answer. The problem is the following one.

Compute the initial value of a forward-starting swap that begins at $t=1$, with maturity $T=10$ and a fixed rate of 4.5%. (The first payment then takes place at $t=2$ and the final payment takes place at $t=11$ as we are assuming, as usual, that payments take place in arrears.) You should assume a swap notional of 1 million and assume that you receive floating and pay fixed.)

We also know that

  • $r_{0,0}=5\%$
  • $u=1.1$
  • $d=0.9$
  • $q=1−q=1/2$

Using forward equations from $t=1$ to $t=9$, I cannot resolve the problem:

Here is what I have done in Excel with a final result of -31076 but it is not the correct answer:

enter image description here

by Katherine99 at August 23, 2016 08:41 AM

Why is the Black 76 model not considered an interest rate model?

The Black 76 model is one of the standard models for interest rate derivatives like pricing caps, floors, swaptions, etc.

The Black 76 model is given as $$dF_t = \sigma F_t dW_t$$ so it models the dynamics of the forward rate $F_t$ which implies a certain term structure. Why is the Black 76 model not considered an interest rate model (like Vasicek) in the literature even though it is used for pricing interest rate derivatives?

by dnl at August 23, 2016 08:31 AM

HFT to blame for Flash Crashes?

Some people 1, 2, 3 claim that High Frequency Trading is partly to blame for the extreme volatilities in the markets yesterday (24. August 2015).

Is that true?

Is the amount HFTs move even enough to push the markets down like that? Does this behaviour align with the way they operate?

How can you explain the low Dow Jones market open? Isn't it more likely that private investors just sell at market open? Why would HFTs even trade directly to market open when there is no arbitrage to make?

by joachim at August 23, 2016 07:59 AM



Using quantlib to price swaps with different payment and calculation resets for floating leg

I understand the VanillaSwap object assumes that payment and calculation resets are the same, so is there any way we could use quantlib to price a swap with different reset and calculation frequencies? (say payment is semiannual but resets is annual).

A few candidates I've considered are:

  1. NonstandardSwap: however I think this does not allow different payment and reset schedules too.

  2. Swap: it takes 2 legs but Leg itself is virtual, however there are many other ways implementing this, one way is to use the IborCoupon however that seems to require repeatedly creating every single coupon in order to constructing Leg.

Is there any other simpler way to deal with this given that everything else is similar to a VanillaSwap except using different payment and calculation dates?

by AZhu at August 23, 2016 07:42 AM


scikit learn decision tree model evaluation

Here are the related code and document, wondering for the default cross_val_score without explicitly specify score, the output array means precision, AUC or some other metrics?

Using Python 2.7 with miniconda interpreter.

>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf,,, cv=10)
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
        0.93...,  0.93...,  1.     ,  0.93...,  1.      ])

regards, Lin

by Lin Ma at August 23, 2016 07:40 AM

My loss starts increasing as I decay learning rate, Tensorflow?

I'm using exponential decay to decay my learning rate after every 10 epochs. As you can see in the output below, as my learning rate starts to decrease, my loss starts to increase and I've tried several variations and same thing happens every time. What could go wrong?

 global_step = tf.Variable(0, name="global_step", trainable=False)
 decayed_learning_rate = tf.train.exponential_decay(learning_rate = 0.0001,global_step = global_step,decay_steps = 1000, decay_rate = 0.6, staircase = True)
 optimizer= tf.train.MomentumOptimizer(learning_rate = decayed_learning_rate, momentum = 0.9)
 minimize_loss = optimizer.minimize(loss, global_step=global_step)  

Here is the output:

    Epoch Finished
    Loss after one Epoch(Training) = 8.291080, Training Accuracy= 0.18000
    Loss after one Epoch(Validation) = 8.464677, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 8.631430, Test Accuracy= 0.13000
    Epoch Finished
    Loss after one Epoch(Training) = 4.619487, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 4.835144, Validation Accuracy= 0.14000
    Loss after one Epoch(Test) = 5.233496, Test Accuracy= 0.09000
    Epoch Finished
    Loss after one Epoch(Training) = 4.591153, Training Accuracy= 0.10000
    Loss after one Epoch(Validation) = 4.878084, Validation Accuracy= 0.09000
    Loss after one Epoch(Test) = 4.112285, Test Accuracy= 0.11000
    Epoch Finished
    Loss after one Epoch(Training) = 4.530641, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 4.874103, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 4.225502, Test Accuracy= 0.14000
    Epoch Finished
    Loss after one Epoch(Training) = 3.664831, Training Accuracy= 0.26000
    Loss after one Epoch(Validation) = 3.207108, Validation Accuracy= 0.29000
    Loss after one Epoch(Test) = 4.435939, Test Accuracy= 0.17000
    Epoch Finished
    Loss after one Epoch(Training) = 3.682740, Training Accuracy= 0.26000
    Loss after one Epoch(Validation) = 3.794605, Validation Accuracy= 0.21000
    Loss after one Epoch(Test) = 3.890673, Test Accuracy= 0.17000
    Epoch Finished
    Loss after one Epoch(Training) = 3.638363, Training Accuracy= 0.27000
    Loss after one Epoch(Validation) = 4.057161, Validation Accuracy= 0.21000
    Loss after one Epoch(Test) = 4.400304, Test Accuracy= 0.19000
    Epoch Finished
    Loss after one Epoch(Training) = 3.290856, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 3.573865, Validation Accuracy= 0.02000
    Loss after one Epoch(Test) = 3.289892, Test Accuracy= 0.13000
    Epoch Finished
    Loss after one Epoch(Training) = 3.249848, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 3.816904, Validation Accuracy= 0.09000
    Loss after one Epoch(Test) = 3.365518, Test Accuracy= 0.09000
    Epoch Finished
    Loss after one Epoch(Training) = 3.261417, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 3.051553, Validation Accuracy= 0.13000
    Loss after one Epoch(Test) = 3.935049, Test Accuracy= 0.10000
    Epoch Finished
    Loss after one Epoch(Training) = 3.274293, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 3.341079, Validation Accuracy= 0.12000
    Loss after one Epoch(Test) = 3.465601, Test Accuracy= 0.09000
    Epoch Finished
    Loss after one Epoch(Training) = 3.245074, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 3.655849, Validation Accuracy= 0.09000
    Loss after one Epoch(Test) = 3.890745, Test Accuracy= 0.11000
    Epoch Finished
    Loss after one Epoch(Training) = 3.242341, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 3.527991, Validation Accuracy= 0.04000
    Loss after one Epoch(Test) = 3.207819, Test Accuracy= 0.12000
    Epoch Finished
    Loss after one Epoch(Training) = 3.277830, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 3.797029, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 3.317770, Test Accuracy= 0.11000
    Epoch Finished
    Loss after one Epoch(Training) = 3.269509, Training Accuracy= 0.12000
    Loss after one Epoch(Validation) = 3.074466, Validation Accuracy= 0.12000
    Loss after one Epoch(Test) = 3.887167, Test Accuracy= 0.10000
    Epoch Finished
    Loss after one Epoch(Training) = 4.100363, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.208894, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 4.150678, Test Accuracy= 0.15000
    Epoch Finished
    Loss after one Epoch(Training) = 4.037428, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.366947, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 4.501517, Test Accuracy= 0.09000
    Epoch Finished
    Loss after one Epoch(Training) = 4.048151, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.315053, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 3.972508, Test Accuracy= 0.10000
    Epoch Finished
    Loss after one Epoch(Training) = 4.046428, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.649216, Validation Accuracy= 0.08000
    Loss after one Epoch(Test) = 4.125694, Test Accuracy= 0.11000
    Epoch Finished
    Loss after one Epoch(Training) = 4.082591, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 3.639134, Validation Accuracy= 0.16000
    Loss after one Epoch(Test) = 4.476624, Test Accuracy= 0.16000
    Epoch Finished
    Loss after one Epoch(Training) = 4.068653, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 4.141028, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 4.086758, Test Accuracy= 0.15000
    Epoch Finished
    Loss after one Epoch(Training) = 4.066084, Training Accuracy= 0.13000
    Loss after one Epoch(Validation) = 4.252730, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 4.357038, Test Accuracy= 0.10000
    Epoch Finished
    Loss after one Epoch(Training) = 4.031103, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.360917, Validation Accuracy= 0.10000
    Loss after one Epoch(Test) = 3.916987, Test Accuracy= 0.11000
    Epoch Finished
    Loss after one Epoch(Training) = 4.031075, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 4.653004, Validation Accuracy= 0.07000
    Loss after one Epoch(Test) = 4.183711, Test Accuracy= 0.10000
    Epoch Finished
    Loss after one Epoch(Training) = 4.039016, Training Accuracy= 0.14000
    Loss after one Epoch(Validation) = 3.654388, Validation Accuracy= 0.15000
    Loss after one Epoch(Test) = 4.228384, Test Accuracy= 0.18000

by shader at August 23, 2016 07:22 AM



numpy.ndarray syntax understanding for confirmation

I am referring the code example here (, and specifically confused by this line[:, :2], since is 150 (row) * 4 (column) dimensional I think it means, select all rows, and the first two columns. I ask here to confirm if my understanding is correct, since I take time but cannot find such syntax definition official document.

Another question is, I am using the following code to get # of rows and # of columns, not sure if better more elegant ways? My code is more Python native style and not sure if numpy has better style to get the related values.

print len( # for number of rows
print len([0]) # for number of columns

Using Python 2.7 with miniconda interpreter.


# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets

# import some data to play with
iris = datasets.load_iris()
X =[:, :2]  # we only take the first two features.
Y =

h = .02  # step size in the mesh

logreg = linear_model.LogisticRegression(C=1e5)

# we create an instance of Neighbours Classifier and fit the data., Y)

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z,

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k',
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

regards, Lin

by Lin Ma at August 23, 2016 06:50 AM

How to run a theano.function on TensorVariable

I want to do something similar to looping, over TensorVariables.After some research I noticed I could make use of theano.scan to simulate a loop.

I wrote the following code, and ran theano.function on a TensorVariable, where the numeric value is not determined yet. But apparently the theano.function expects numerical values and not symbolic tensorvariables. Is there a way to run a function on symbolic tensorvariables, or alternatively, a way to convert tensorvariables into nparrays for the input to theano.function?

The code is as follows:

def CI(observed, event_times, estimated_risk): #C_index in tensor mode
    ti = T.dvector('ti')
    tj = T.dvector('tj')
    o = T.dvector('o')
    has_ones = T.matrix('has_ones')

    omega, updates = theano.scan(fn=lambda tj_element, ti_vector, o_vector: + tj_element,o_vector),
                             non_sequences=[ti, o])
    calculate_omega = theano.function(inputs=[tj, ti, o], outputs=omega)
    om = calculate_omega(event_times,event_times,observed)
    om = transform_positives_to_ones(om)
    om_count = count_ones(om)

and this is the error that I get:

'Expected an array-like object, but found a Variable: '
TypeError: ('Bad input argument to theano function with name "../"  at index 2(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

The Error is for om = calculate_omega(event_times,event_times,observed) because both event_times and observed are TensorVariables.

by Mohsen Salari at August 23, 2016 06:47 AM

Sentiment analysis Using BernoulliNB Algorithm in C

I have chosen this topic as my college project. I'm interested in learning sentiment analysis but I dont know from where to start the coding.

Need Help.

I only studied about BernoulliNB till now.

by Anuj Vikal at August 23, 2016 06:38 AM

caffe test error no field named "net" on testing MNIST

I have the same problem as Caffe error: no field named "net" on testing MNIST.


keides2@ubuntu:~/caffe$ build/tools/caffe test -model examples/mnist/lenet_solver.prototxt 
                       -weights examples/mnist/lenet_iter_10000.caffemodel -iterations 100

I get the following output:

 I0820 11:31:33.820005 113569 caffe.cpp:279] Use CPU. [libprotobuf ERROR google/protobuf/] Error parsing text-format caffe.NetParameter: 2:4: Message type "caffe.NetParameter" has no field named "net".
F0820 11:31:33.844912 113569 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: examples/mnist/lenet_solver.prototxt
Check failure stack trace:
@     0x7f3f9744edaa  (unknown) 
@     0x7f3f9744ece4  (unknown) 
@     0x7f3f9744e6e6  (unknown) 
@     0x7f3f97451687  (unknown) 
@     0x7f3f977fc0c7  caffe::ReadNetParamsFromTextFileOrDie() 
@     0x7f3f97834b0f  caffe::Net<>::Net() 
@           0x407843  test() 
@           0x405f7b  main 
@     0x7f3f9645af45  (unknown) 
@           0x406677  (unknown) 
@              (nil)  (unknown)

'lenet_solver.prototxt' and 'lenet_train_test.prototxt' are original (not modified).

And then,

keides2@ubuntu:~/caffe$ printenv PYTHONPATH

Could you help me?

by keisuke shimatani at August 23, 2016 06:37 AM


Forecast of ARMA-GARCH model in R

I managed to forecast a GARCH model yesterday and run a Monte Carlo simulation on R. Nevertheless, I can't do the same with an ARMA-GARCH. I tested 4 different method but without achieving an ARMA-GARCH simulation with my data.

The packages and the data I used:


getSymbols("DEXB.BR",from="2005-07-01", to="2015-07-01")
STOCK.rtn=diff(STOCK[,6] )
STOCK.diff = STOCK.rtn[2:length(STOCK.rtn)]
GA_1_1=garch(ARI_2_1$residuals, order = c(1,1))

First tested method

specifi = garchSpec(model = list(ar = c(0.49840, -0.0628), ma =c(-0.4551), omega = 8.393e-08, alpha = 1.356e-01, beta = 8.844e-01))

garchSim(spec = specifi, n = 500, n.start = 200, extended = FALSE)

This lead to a "NaN" forecast.

garchSim(spec = specifi, n = 500)

n=1000 armagarch.sim_1 = rep(0,n) armagarch.sim_50 = rep(0,n) armagarch.sim_100 = rep(0,n) for(i in 1:n) { armagarch.sim=garchSim(spec = specifi, n = 500, n.start = 200, extended = FALSE) armagarch.sim_1[i] = armagarch.sim[1] armagarch.sim_50[i] = armagarch.sim[50] armagarch.sim_100[i] = armagarch.sim[100]


Second tested method

GSgarch.Sim(N = 500, mu = 0, a = c(0.49840, -0.0628), b = c(-0.4551), omega = 8.393e-08, alpha = c(1.356e-01), gm = c(0), beta = c(8.844e-01), cond.dist = "norm")

This part works.


Garmagarch.sim_1 = rep(0,n)
Garmagarch.sim_50 = rep(0,n)
Garmagarch.sim_100 = rep(0,n)

for(i in 1:n)
    Garmagarch.sim= GSgarch.Sim(N = 500, mu = 0, a = c(0.49840, -0.0628), b = c(-0.4551),omega = 8.393e-08, alpha = c(1.356e-01), gm = c(0), beta c(8.844e-01), cond.dist = "norm")

    Garmagarch.sim_1[i] = Garmagarch.sim[1]
    Garmagarch.sim_50[i] = Garmagarch.sim[50]
    Garmagarch.sim_100[i] = Garmagarch.sim[100]


The simulation runs but

> Garmagarch.sim[1]
[1] "arma(2,1)-aparch(1,1) ## Intercept:FALSE"


> Garmagarch.sim[50]

Third tested method

ga_arma = garch.sim(alpha=c(8.393e-08,1.356e-01),beta =8.844e-01 ,n=500, ntrans=200)

This lead to

Error in garch.sim(alpha = c(8.393e-08, 0.1356), beta = 0.8844, n = 500,  : 
  Check model: it does not have finite variance

arima.sim(ARI_2_1, 500, innov = ga_arma ,n.start = 200)

And this to

Error in arima.sim(ARI_2_1, 500, innov = ga_arma, n.start = 200) : 
  la partie 'ar' du mopdèle n'est pas stationaire

which mean that the 'ar' part of the model isn't stationnary.

Fourth tested method

forecast(ARI_2_1, h = 500, bootstrap = TRUE, npaths=200)

This one actually works but I don't know how to add the GARCH component.

forecast(specifi, h = 500, bootstrap = TRUE, npaths=200)

Thanks !

by Tom at August 23, 2016 06:28 AM

Papers on temporary price impact

Can anyone recommend papers that model how long temporary price impact last when you buy / sell a trade? This would fall under the TCA realm (Trade Cost Analysis).

Thank you.

by user3022875 at August 23, 2016 06:25 AM


What is the VC Dimension of the $k-$Junta class

A boolean function $f(x_1,x_2,\dots,x_n)$ is $k$-Junta if it depends on at most $k$ variables. Consider the class $\mathcal{J}_{\leq k}$ of all $k$-Juntas over $n$ variables, what is the VC dimension of this class?

Or at least is there any known method to construct the largest shattered set for small values of $k$ say when $k=1$.

by seteropere at August 23, 2016 06:21 AM


Tensorflow RNN cells weight sharing

I am wondering if in the following code the weights of the two stacked cells are shared:

cell = rnn_cell.GRUCell(hidden_dim)
stacked_cell = tf.nn.rnn_cell.MultiRNNCell([cell] * 2)

If they are not shared, how to force sharing in any RNNs?

Note: I might more probably want to share weights in a nested input-to-output connected RNN configuration where the first layer is cloned many times for every input of the second layer (e.g. sentences where 1st layer represents letters and 2nd layer represents words gathered from iterating 1st layer's outputs)

by Guillaume Chevalier at August 23, 2016 06:14 AM

In Keras, If samples_per_epoch is less than the 'end' of the generator when it (loops back on itself) will this negatively affect result?

I'm using Keras with Theano to train a basic logistic regression model.

Say I've got a training set of 1 million entries, it's too large for my system to use the standard without blowing away memory.

  • I decide to use a python generator function and fit my model using model.fit_generator().
  • My generator function returns batch sized chunks of the 1M training examples (they come from a DB table, so I only pull enough records at a time to satisfy each batch request, keeping memory usage in check).
  • It's an endlessly looping generator, once it reaches the end of the 1 million, it loops and continues over the set

There is a mandatory argument in fit_generator() to specify samples_per_epoch. The documentation indicates

samples_per_epoch: integer, number of samples to process before going to the next epoch.

I'm assuming the fit_generator() doesn't reset the generator each time an epoch runs, hence the need for a infinitely running generator.

I typically set the samples_per_epoch to be the size of the training set the generator is looping over.

However, if samples_per_epoch this is smaller than the size of the training set the generator is working from and the nb_epoch > 1:

  • Will you get odd/adverse/unexpected training resulting as it seems the epochs will have differing sets training examples to fit to?
  • If so, do you 'fastforward' you generator somehow?

by Ray at August 23, 2016 06:10 AM

Matlab : How to compare the reconstructed data after doing probabilistic PCA?

I am having a tough time in understanding how to prepare the model for applying probabilistic pca so that I can estimate the data in the lower dimensional space. I have understood the theory that is described in the paper Tipping and Bishop, Probabilistic Principal Component Analysis. I want to estimate the source S using point estimation technique such as Maximum Apriori Estimation. The model is

X = A*S

where X is the observed higher dimensional data that is assumed to be generated from a lower dimensional space S and A is the transformation matrix. But, according to the equations, if the output data are the observations say, load fisheriris

X = fisheriris

then what is S?

I can work with any randomly generated data, but when it comes to real data base, I cannot understand how to apply the concept. In practical cases, what we observe are the higher dimension data and we want to find its lower dimension representation either by using principal component analysis or its probabilistic version (there are other dimension reduction techniques as well, but I want to first understand pca)

The implementation of the theory is presented in Code

I have modified the code for the probabilistic pca and the modified version is below

 clear all
n1 = 10; %d dimension
n2 = 100; % number of examples

ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);

T = 1:n2;
 S = [ exp(-T/150).*cos( 2*pi*T/50 )
          exp(-T/150).*sin( 2*pi*T/50 ) ];

% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( S.^2, 2 ) ), 1, n2 );

Xr = Ar * S;

X = Xr;  %no noise

  Xprobe = X ;

opts = struct( 'maxiters', 30,...
               'algorithm', 'map',...
               'xprobe', Xprobe,...
               'uniquesv', 0,...
               'cfstop', [ 100 0 0 ],...
               'minangle', 0 );

%  pca_pt
  [ A, S_est, Mu, V, hp, lc ] =  pca_pt( X, ncomp,opts );
 hold on

%Non probabilistic PCA

clear all
n1 = 5; %d dimension
n2 = 500; % number of examples

ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);
T = 1:n2;
%generating synthetic data from a dynamical model
S = [ exp(-T/150).*cos( 2*pi*T/50 )
       exp(-T/150).*sin( 2*pi*T/50 ) ];
% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( S.^2, 2 ) ), 1, n2 );
Xr = Ar * S;

    X = Xr ;

XX = X';
[pc, ~] = eigs(cov(XX), ncomp);

The above code works ok for the synthetic data but how can I apply the concept to a real data base set?

Problem : I can compare the performance of PPCA algorithm by plotting the actual reduced dimensional data $S$ and the estimated $S_{est}$ This is applicable to synthetic data. But for real data like the FisherIris, after doing PPCA how can I compare the output of PPCA (estimated S) with the actual S? I will not be having the actual S since I only have the fisher data. I am really confused.

by SKM at August 23, 2016 05:59 AM

Azure Machine Learning Experiment Creation

I am new to create Experiments in Azure ML. I want to done a sample and small POC on Azure ML.

I have a data for the students consisting of StudentID, Student Name and Marks for Monthly Tests 1,2 and 3. I just to want to Predict data for the Final Monthly Test (i.e., Monthly Test 4).

I don't know how to create and what kind of Transformations to be used in Predicting the Data.

Anyone Please...

Thanks in Advance Pradeep

by Pradeep at August 23, 2016 05:52 AM

Derivation in Theano Implementation

I have two vector A , B train a matrix G for AG(B^T) . How to implement in Theano ? Derivation is confused for me. Thanks a lot.

by FancyCoder at August 23, 2016 05:42 AM


Building Financial Data Time Series Database from scratch

My company is starting a new initiative aimed at building a financial database from scratch.

We would be using it in these ways:

  1. Time series analysis of: a company's financial data (ex: IBM's total fixed assets over time), aggregations (ex: total fixed assets for the materials sector over time), etc.
  2. Single company snapshot: various data points of a single company
  3. Analysis of multiple companies across multiple data fields for a single time frame, usually the current day.
  4. Backtesting, rank analysis, data analysis, etc. of ideas and custom factors.

Approximate breadth of data:

  1. 3000 companies
  2. 3500 data fields (ex: total fixed assets, earnings, etc.)
  3. 500 aggregation levels
  4. Periodicity: daily, monthly, quarterly, annual
  5. 20 year look-back that would grow over time


  1. What database should we choose? We are currently limited to free options and we prefer open source (on principle). Currently we use PostgreSQL.
  2. How should I structure this schema-wise? I am thinking of breaking up the field types into categories (balance sheet, descriptive, income statement, custom calculations, etc.) so each company would have a table for balance sheet, descriptive, income statement, custom calculations, etc. with each row representing one day and appropriate fields for the category of table for columns/fields. That will be my fully normalized database. Using the fully normalized database, I will then build a data warehouse, temp tables, views, etc. that are not fully normalized to make queries fast for the various use cases described previously. One issue with this approach is the number of tables. If I have, say, 5 categories of company data and 3000 companies I will have 15,000 tables in my fully normalized database for just storing the company data. But still, from my perspective, it seems like the best way to do it.
  3. What is the best strategy for indexing and structuring the time series portion of this? I've talked to a few people and I did some research on time series database indexing/structure, but help/references/tips/etc. in this area, even if they duplicate what I have found, would be helpful. I realize this depends on the answer to #1 above, so maybe assume I am staying with PostgreSQL and I will be building out the "time series" functionality specific bells and whistles myself.


  • In-depth technical answers and references/links much preferred.
  • This is for a small buy side financial investment firm.
  • If you have been down this road before, suggestions outside of the scope of my initial question are welcome.
  • We cannot compromise on the amount of data, so reducing the amount of data isn't an option for us; however, the numbers I supplied are estimates only.
  • If there is a better place to ask this question, please let me know.
  • There is much more to what we want to do, but this represents the core of what we want to do from a data structure perspective.

by mountainclimber at August 23, 2016 05:40 AM

How to calibrate Hull-White from zero curve?

I am interested in calibrating a Hull-White model to the market.

I do not, however, have data on anything except the market zero curves, as all derivatives are being traded OTC. My plan is to calibrate the model to the zero curve.

  1. Will this produce a sensible calibration of the model in respect of derivatives?
  2. If not, how does one proceed in this case?

by RonRich at August 23, 2016 05:23 AM



What is the point of stacking multiple convolution layers?

In the context of a convolutional neural network designed to extract DNA motifs, why would one stack convolution layers without any activation or max pooling functions in between?

by odeng at August 23, 2016 04:29 AM

confused by numpy meshgrid output

Using Python 2.7 with miniconda interpreter. I am confused by what means N-D coordinate in the following statements, and could anyone tell how in the below sample xv and yv are calculated, it will be great.

"Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,..., xn."

>>> nx, ny = (3, 2)
>>> x = np.linspace(0, 1, nx)
>>> y = np.linspace(0, 1, ny)
>>> xv, yv = meshgrid(x, y)
>>> xv
array([[ 0. ,  0.5,  1. ],
       [ 0. ,  0.5,  1. ]])
>>> yv
array([[ 0.,  0.,  0.],
       [ 1.,  1.,  1.]])

regards, Lin

by Lin Ma at August 23, 2016 04:15 AM


Determining maximum strategy capacity and optimal order size for low frequency equity strategy

I have developed a low frequency equity trading strategy that seems to work well with stocks in the S&P 500. Someone asked me about the maximum capacity of the strategy (how much AUM I could handle), and that led me to think about how best to determine how much of each stock I could trade, with the considerations of slippage, market impact, and creating a footprint in the market with a large order in mind.

My initial thought was to assume that I could safely trade (that is avoid undue cost and visibility) a small percentage of each stock's ADV, so the maximum capacity of the strategy would be the dollar value of the sum of x % of ADV of all of the stocks. If each order was then roughly say <1% of the stock's ADV, I would probably enter these orders as limit orders with a minimum display size and the rest in reserve, with the expectation that the market will work its way through my limit. While this would minimize slippage, I am concerned that predatory systems could pick up on the reserve quantity, and that that will provide information. I could layer the order in at different price levels, but that could be expensive. Any thoughts on this are appreciated.

I am also thinking about adding money management considerations to each order size (Kelly Consideration or Fixed Ratio), which may only further complicate the question of max capacity. Is there a general rule of thumb for determining capacity at a high level? Cheers

by pedro at August 23, 2016 04:11 AM


What is the relation between variational infernece, variational bayes and variational EM?

What is the relation between variational infernece, variational bayes and variational EM?

by sbsbsb945 at August 23, 2016 03:29 AM


How do I make use of unused space on my boot drive on FreeBSD

I have an old FreeBSD Server (running 7.3-RELEASE) that desperately needs additional storage. In fact, it has some-- the original 20G SCSI drives have been replaced by 300G SCSI drives, so in theory there is 280G available that could be used.

I'd like to make use of this space. I think the best way to do this is by formatting the unused space as a new slice on the existing drive, but I'm not clear how to do this without destroying the data on the existing slice. Most of the documentation I can find about doing this refers to initial installation. I know how to set up slices and partitions during initial installation, but not how to claim unused space on the drive AFTER initial installation.

(I'd also be happy to expand the slice and add additional partitions to the existing slice, but I've heard that this is riskier).

I thought the easy way to do this might be to use /stand/sysinstall, but when I go into either Configure -> FDisk or Configure -> Label, I get this message:

No disks found!  Please verify that your disk controller is being
properly probed at boot time.  See the Hardware Guide on the
Documentation menu for clues on diagnosing this type of problem.

This is obviously untrue, since I'm actually running off of a disk when I get this message, but maybe sysinstall just doesn't like messing with the boot disk?

Output of fdisk da0:

******* Working on device /dev/da0 *******
parameters extracted from in-core disklabel are:
cylinders=2235 heads=255 sectors/track=63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=2235 heads=255 sectors/track=63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 35905212 (17531 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 1023/ head 254/ sector 63
The data for partition 2 is:
The data for partition 3 is:
The data for partition 4 is:

Output of bsdlabel da0s1

# /dev/da0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:  2097152        0    4.2BSD     2048 16384    89
  b:  2097152  2097152      swap
  c: 35905212        0    unused        0     0         # "raw" part, don't edit
  e:  2097152  4194304    4.2BSD     2048 16384    89
  f: 29613756  6291456    4.2BSD     2048 16384    89


I came a cross the advice to use sade for this purpose. Unfortunately, sade can't see much empty space:

         0         63         62        -     12     unused        0
        63   35905212   35905274    da0s1      8    freebsd      165
  35905275      10501   35915775        -     12     unused        0

This may be a dead end. Do I need to figure out drive geometry somehow? It might be relevant to mention that the drive is a RAID 1 mirror set; originally the mirrored drives were both 20G SCSI drives but they've both been swapped out with 300G drives. I'm willing to temporarily break the mirror if that will help.

by davidcl at August 23, 2016 02:44 AM


Finite automaton - language acceptance

In the book "The New Turing Omnibus", an excercise reads as the follows:

"Show that no finite automaton can accept the language consisting of all words of the form $a^n b^n, n=1,2,3,...$ The formula represents $n$ a's followed by $n$ b's."

I have no idea why a finite automaton shouldn't accept such words. Couldn't one simply design an automaton that reads a word and always outputs "Accepted"?

by Robert Hönig at August 23, 2016 01:57 AM

arXiv Cryptography and Security

Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability. (arXiv:1608.06254v1 [cs.CR])

This paper proposes a technique for automatically learning semantic malware signatures for Android from very few samples of a malware family. The key idea underlying our technique is to look for a maximally suspicious common subgraph (MSCS) that is shared between all known instances of a malware family. An MSCS describes the shared functionality between multiple Android applications in terms of inter-component call relations and their semantic metadata (e.g., data-flow properties). Our approach identifies such maximally suspicious common subgraphs by reducing the problem to maximum satisfiability. Once a semantic signature is learned, our approach uses a combination of static analysis and a new approximate signature matching algorithm to determine whether an Android application matches the semantic signature characterizing a given malware family.

We have implemented our approach in a tool called ASTROID and show that it has a number of advantages over state-of-the-art malware detection techniques. First, we compare the semantic malware signatures automatically synthesized by ASTROID with manually-written signatures used in previous work and show that the signatures learned by ASTROID perform better in terms of accuracy as well as precision. Second, we compare ASTROID against two state-of-the-art malware detection tools and demonstrate its advantages in terms of interpretability and accuracy. Finally, we demonstrate that ASTROID's approximate signature matching algorithm is resistant to behavioral obfuscation and that it can be used to detect zero-day malware. In particular, we were able to find 22 instances of zero-day malware in Google Play that are not reported as malware by existing tools.

by <a href="">Yu Feng</a>, <a href="">Osbert Bastani</a>, <a href="">Ruben Martins</a>, <a href="">Isil Dillig</a>, <a href="">Saswat Anand</a> at August 23, 2016 01:30 AM

Datatype defining rewrite systems for the ring of integers, and for natural and integer arithmetic in unary view. (arXiv:1608.06212v1 [cs.LO])

A datatype defining rewrite system (DDRS) is a ground-complete term rewriting system, intended to be used for the specification of datatypes. As a follow-up of an earlier paper we define two concise DDRSes for the ring of integers, each comprising only twelve rewrite rules, and prove their ground-completeness. Then we introduce DDRSes for a concise specification of natural number arithmetic and integer arithmetic in unary view, that is, arithmetic based on unary append (a form of tallying) or on successor function. Finally, we relate one of the DDRSes for the ring of integers to the above-mentioned DDRSes for natural and integer arithmetic in unary view.

by <a href="">Jan A. Bergstra</a>, <a href="">Alban Ponse</a> at August 23, 2016 01:30 AM

MISO: An intermediate language to express parallel and dependable programs. (arXiv:1608.06171v1 [cs.DC])

One way to write fast programs is to explore the potential parallelism and take advantage of the high number of cores available in microprocessors. This can be achieved by manually specifying which code executes on which thread, by using compiler parallelization hints (such as OpenMP or Cilk), or by using a parallel programming language (such as X10, Chapel or Aeminium. Regardless of the approach, all of these programs are compiled to an intermediate lower-level language that is sequential, thus preventing the backend compiler from optimizing the program and observing its parallel nature. This paper presents MISO, an intermediate language that expresses the parallel nature of programs and that can be targeted by front-end compilers. The language defines 'cells', which are composed by a state and a transition function from one state to the next. This language can express both sequential and parallel programs, and provides information for a backend- compiler to generate efficient parallel programs. Moreover, MISO can be used to automatically add redundancy to a program, by replicating the state or by taking advantage of different processor cores, in order to provide fault tolerance for programs running on unreliable hardware.

by <a href="">Alcides Fonseca</a>, <a href="">Raul Barbosa</a> at August 23, 2016 01:30 AM

Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization. (arXiv:1608.06169v2 [cs.DB] UPDATED)

Integrity constraints (ICs) provide a valuable tool for expressing and enforcing application semantics. However, formulating constraints manually requires domain expertise, is prone to human errors, and may be excessively time consuming, especially on large datasets. Hence, proposals for automatic discovery have been made for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). ODs properly subsume FDs, as they can additionally express business rules involving order; e.g., an employee never has a higher salary while paying lower taxes compared with another employee.

We address the limitations of prior work on OD discovery which has factorial complexity in the number of attributes, is incomplete (i.e., it does not discover valid ODs that cannot be inferred from the ones found) and is not concise (i.e., it can result in "redundant" discovery and overly large discovery sets). We improve significantly on complexity, offer completeness, and define a compact canonical form. This is based on a novel polynomial mapping to a canonical form for ODs, and a sound and complete set of axioms (inference rules) for canonical ODs. This allows us to develop an efficient set-containment, lattice-driven OD discovery algorithm that uses the inference rules to prune the search space. Our algorithm has exponential worst-case time complexity in the number of attributes and linear complexity in the number of tuples. We prove that it produces a complete, minimal set of ODs (i.e., minimal with regards to the canonical representation). Finally, using real and synthetic datasets, we experimentally show orders-of-magnitude performance improvements over the current state-of-the-art algorithm and demonstrate effectiveness of our techniques.

by <a href="">Jaroslaw Szlichta</a>, <a href="">Parke Godfrey</a>, <a href="">Lukasz Golab</a>, <a href="">Mehdi Kargar</a>, <a href="">Divesh Srivastava</a> at August 23, 2016 01:30 AM

Monadic Datalog Containment on Trees Using the Descendant-Axis. (arXiv:1608.06130v1 [cs.LO])

In their AMW14-paper, Frochaux, Grohe, and Schweikardt showed that the query containment problem for monadic datalog on finite unranked labeled trees is Exptime-complete when (a) considering unordered trees using the child-axis, and when (b) considering ordered trees using the axes firstchild, nextsibling, and child. Furthermore, when allowing to use also the descendant-axis, the query containment problem was shown to be solvable in 2-fold exponential time, but it remained open to determine the problems exact complexity in presence of the descendant-axis. The present paper closes this gap by showing that, in the presence of the descendant-axis, the problem is 2Exptime-hard.

by <a href="">Andr&#xe9; Frochaux</a>, <a href="">Nicole Schweikardt</a> at August 23, 2016 01:30 AM

Estimating Maximum Error Impact in Dynamic Data-driven Applications for Resource-aware Adaption of Software-based Fault-Tolerance. (arXiv:1608.06103v1 [cs.DC])

The rise of transient faults in modern hardware requires system designers to consider errors occurring at runtime. Both hardware- and software-based error handling must be deployed to meet application reliability requirements. The level of required reliability can vary for system components and depend on input and state, so that a selective use of resilience methods is advised, especially for resource-constrained platforms as found in embedded systems. If an error occurring at runtime can be classified as having negligible or tolerable impact, less effort can be spent on correcting it. As the actual impact of an error is often dependent on the state of a system at time of occurrence, it can not be determined precisely for highly dynamic workloads in data-driven applications. We present a concept to estimate error propagation in sets of tasks with variable data dependencies. This allows for a coarse grained analysis of the impact a failed task may have on the overall output. As an application example, we demonstrate our method for a typical dynamic embedded application, namely a decoder for the H.264 video format.

by <a href="">Bj&#xf6;rn B&#xf6;nninghoff</a>, <a href="">Horst Schirmeier</a> at August 23, 2016 01:30 AM

On Preambles With Low Out of Band Radiation for Channel Estimation. (arXiv:1608.06098v1 [cs.NI])

Existing preamble based channel estimation techniques give no consideration to the out-of-band (OOB) radiation of the transmit preambles which is a key aspect for novel communication schemes for future cellular systems. In this paper, preambles with low OOB radiation are designed for channel estimation. Two particular preamble design techniques are proposed and their performance is analyzed in terms of OOB radiation and estimation error. The obtained preambles are shown to have 5 to 20 dB lower OOB radiation than the existing preamble based estimation techniques. As a case study, the estimated channel values are used in equalization of a MIMO GFDM system that is aimed for transmit diversity.

by <a href="">Gourab Ghatak</a>, <a href="">Maximilian Matth&#xe9;</a>, <a href="">Adrish Banerjee</a>, <a href="">Gerhard P. Fettweis</a> at August 23, 2016 01:30 AM

Propositional dynamic logic with Belnapian truth values. (arXiv:1608.06084v1 [cs.LO])

We introduce BPDL, a combination of propositional dynamic logic PDL with the basic four-valued modal logic BK studied by Odintsov and Wansing (`Modal logics with Belnapian truth values', J. Appl. Non-Class. Log. 20, 279--301 (2010)). We modify the standard arguments based on canonical models and filtration to suit the four-valued context and prove weak completeness and decidability of BPDL.

by <a href="">Igor Sedl&#xe1;r</a> at August 23, 2016 01:30 AM

Worst case QC-MDPC decoder for McEliece cryptosystem. (arXiv:1608.06080v1 [cs.CR])

McEliece encryption scheme which enjoys relatively small key sizes as well as a security reduction to hard problems of coding theory. Furthermore, it remains secure against a quantum adversary and is very well suited to low cost implementations on embedded devices.

Decoding MDPC codes is achieved with the (iterative) bit flipping algorithm, as for LDPC codes. Variable time decoders might leak some information on the code structure (that is on the sparse parity check equations) and must be avoided. A constant time decoder is easy to emulate, but its running time depends on the worst case rather than on the average case. So far implementations were focused on minimizing the average cost. We show that the tuning of the algorithm is not the same to reduce the maximal number of iterations as for reducing the average cost. This provides some indications on how to engineer the QC-MDPC-McEliece scheme to resist a timing side-channel attack.

by <a href="">Julia Chaulet</a>, <a href="">Nicolas Sendrier</a> at August 23, 2016 01:30 AM

PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition. (arXiv:1608.06054v1 [cs.IR])

Most methods for Personalized PageRank (PPR) precompute and store all accurate PPR vectors, and at query time, return the ones of interest directly. However, the storage and computation of all accurate PPR vectors can be prohibitive for large graphs, especially in caching them in memory for real-time online querying. In this paper, we propose a distributed framework that strikes a better balance between offline indexing and online querying. The offline indexing attains a fingerprint of the PPR vector of each vertex by performing billions of "short" random walks in parallel across a cluster of machines. We prove that our indexing method has an exponential convergence, achieving the same precision with previous methods using a much smaller number of random walks. At query time, the new PPR vector is composed by a linear combination of related fingerprints, in a highly efficient vertex-centric decomposition manner. Interestingly, the resulting PPR vector is much more accurate than its offline counterpart because it actually uses more random walks in its estimation. More importantly, we show that such decomposition for a batch of queries can be very efficiently processed using a shared decomposition. Our implementation, PowerWalk, takes advantage of advanced distributed graph engines and it outperforms the state-of-the-art algorithms by orders of magnitude. Particularly, it responses to tens of thousands of queries on graphs with billions of edges in just a few seconds.

by <a href="">Qin Liu</a>, <a href="">Zhenguo Li</a>, <a href="">John C.S. Lui</a>, <a href="">Jiefeng Cheng</a> at August 23, 2016 01:30 AM

Deterministic and Fast Randomized Test-and-Set in Optimal Space. (arXiv:1608.06033v1 [cs.DC])

The test-and-set object is a fundamental synchronization primitive for shared memory systems. A test-and-set object stores a bit, initialized to 0, and supports one operation, test&set(), which sets the bit's value to 1 and returns its previous value. This paper studies the number of atomic registers required to implement a test-and-set object in the standard asynchronous shared memory model with n processes. The best lower bound is log(n)-1 for obstruction-free (Giakkoupis and Woelfel, 2012) and deadlock-free (Styer and Peterson, 1989) implementations. Recently a deterministic obstruction-free implementation using O(sqrt(n)) registers was presented (Giakkoupis, Helmi, Higham, and Woelfel, 2013). This paper closes the gap between these known upper and lower bounds by presenting a deterministic obstruction-free implementation of a test-and-set object from Theta(log n) registers of size Theta(log n) bits. We also provide a technique to transform any deterministic obstruction-free algorithm, in which, from any configuration, any process can finish if it runs for b steps without interference, into a randomized wait-free algorithm for the oblivious adversary, in which the expected step complexity is polynomial in n and b. This transformation allows us to combine our obstruction-free algorithm with the randomized test-and-set algorithm by Giakkoupis and Woelfel (2012), to obtain a randomized wait-free test-and-set algorithm from Theta(log n) registers, with expected step-complexity Theta(log* n) against the oblivious adversary.

by <a href="">George Giakkoupis</a>, <a href="">Maryam Helmi</a>, <a href="">Lisa Higham</a>, <a href="">Philipp Woelfel</a> at August 23, 2016 01:30 AM

A Vision for Online Verification-Validation. (arXiv:1608.06012v1 [cs.PL])

Today's programmers face a false choice between creating software that is extensible and software that is correct. Specifically, dynamic languages permit software that is richly extensible (via dynamic code loading, dynamic object extension, and various forms of reflection), and today's programmers exploit this flexibility to "bring their own language features" to enrich extensible languages (e.g., by using common JavaScript libraries). Meanwhile, such library-based language extensions generally lack enforcement of their abstractions, leading to programming errors that are complex to avoid and predict.

To offer verification for this extensible world, we propose online verification-validation (OVV), which consists of language and VM design that enables a "phaseless" approach to program analysis, in contrast to the standard static-dynamic phase distinction. Phaseless analysis freely interposes abstract interpretation with concrete execution, allowing analyses to use dynamic (concrete) information to prove universal (abstract) properties about future execution.

In this paper, we present a conceptual overview of OVV through a motivating example program that uses a hypothetical database library. We present a generic semantics for OVV, and an extension to this semantics that offers a simple gradual type system for the database library primitives. The result of instantiating this gradual type system in an OVV setting is a checker that can progressively type successive continuations of the program until a continuation is fully verified. To evaluate the proposed vision of OVV for this example, we implement the VM semantics (in Rust), and show that this design permits progressive typing in this manner.

by <a href="">Matthew A. Hammer</a>, <a href="">Bor-Yuh Evan Chang</a>, <a href="">David Van Horn</a> at August 23, 2016 01:30 AM

Convergence of Even Simpler Robots without Location Information. (arXiv:1608.06002v1 [cs.DC])

The design of distributed gathering and convergence algorithms for tiny robots has recently received much attention. In particular, it has been shown that convergence problems can even be solved for very weak, \emph{oblivious} robots: robots which cannot maintain state from one round to the next. The oblivious robot model is hence attractive from a self-stabilization perspective, where state is subject to adversarial manipulation. However, to the best of our knowledge, all existing robot convergence protocols rely on the assumption that robots, despite being "weak", can measure distances.

We in this paper initiate the study of convergence protocols for even simpler robots, called \emph{monoculus robots}: robots which cannot measure distances. In particular, we introduce two natural models which relax the assumptions on the robots' cognitive capabilities: (1) a Locality Detection ($\mathcal{LD}$) model in which a robot can only detect whether another robot is closer than a given constant distance or not, (2) an Orthogonal Line Agreement ($\mathcal{OLA}$) model in which robots only agree on a pair of orthogonal lines (say North-South and West-East, but without knowing which is which).

The problem turns out to be non-trivial, and simple median and angle bisection strategies can easily increase the distances among robots (e.g., the area of the enclosing convex hull) over time. Our main contribution are deterministic self-stabilizing convergence algorithms for these two models, together with a complexity analysis. We also show that in some sense, the assumptions made in our models are minimal: by relaxing the assumptions on the \textit{monoculus robots} further, we run into impossibility results.

by <a href="">Debasish Pattanayak</a>, <a href="">Kaushik Mondal</a>, <a href="">Partha Sarathi Mandal</a>, <a href="">Stefan Schmid</a> at August 23, 2016 01:30 AM

Epidemiological Approach for Data Survivability in Unattended Wireless Sensor Networks. (arXiv:1608.05951v1 [cs.DC])

Unattended Wireless Sensor Networks (UWSNs) are Wireless Sensor Networks characterized by sporadic sink presence and operation in hostile settings. The absence of the sink for period of time, prevents sensor nodes to offload data in real time and offer greatly increased opportunities for attacks resulting in erasure, modification, or disclosure of sensor-collected data. In this paper, we focus on UWSNs where sensor nodes collect and store data locally and try to upload all the information once the sink becomes available. One of the most relevant issues pertaining UWSNs is to guarantee a certain level of information survivability in an unreliable network and even in presence of a powerful attackers. In this paper, we first introduce an epidemic-domain inspired approach to model the information survivability in UWSN. Next, we derive a fully distributed algorithm that supports these models and give the correctness proofs.

by <a href="">Jacques M. Bahi</a>, <a href="">Christophe Guyeux</a>, <a href="">Mourad Hakem</a>, <a href="">Abdallah Makhoul</a> at August 23, 2016 01:30 AM

Theoretical design and circuit implementation of integer domain chaotic systems. (arXiv:1608.05945v1 [nlin.CD])

In this paper, a new approach for constructing integer domain chaotic systems (IDCS) is proposed, and its chaotic behavior is mathematically proven according to the Devaney's definition of chaos. Furthermore, an analog-digital hybrid circuit is also developed for realizing the designed basic IDCS. In the IDCS circuit design, chaos generation strategy is realized through a sample-hold circuit and a decoder circuit so as to convert the uniform noise signal into a random sequence, which plays a key role in circuit implementation. The experimental observations further validate the proposed systematic methodology for the first time.

by <a href="">Qianxue Wang</a>, <a href="">Simin Yu</a>, <a href="">Christophe Guyeux</a>, <a href="">Jacques Bahi</a>, <a href="">Xiaole Fang</a> at August 23, 2016 01:30 AM

Two Security Layers for Hierarchical Data Aggregation in Sensor Networks. (arXiv:1608.05936v1 [cs.DC])

Due to resource restricted sensor nodes, it is important to minimize the amount of data transmission among sensor networks. To reduce the amount of sending data, an aggregation approach can be applied along the path from sensors to the sink. However, as sensor networks are often deployed in untrusted and even hostile environments, sensor nodes are prone to node compromise attacks. Hence, an end-to-end secure aggregation approach is required to ensure a healthy data reception. In this paper, we propose two layers for secure data aggregation in sensor networks. Firstly, we provide an end-to-end encryption scheme that supports operations over cypher-text. It is based on elliptic curve cryptography that exploits a smaller key size, allows the use of higher number of operations on cypher-texts, and prevents the distinction between two identical texts from their cryptograms. Secondly, we propose a new watermarking-based authentication that enables sensor nodes to ensure the identity of other nodes they are communicating with. Our experiments show that our hybrid approach of secure data aggregation enhances the security, significantly reduces computation and communication overhead, and can be practically implemented in on-the-shelf sensor platforms.

by <a href="">Jacques M. Bahi</a>, <a href="">Christophe Guyeux</a>, <a href="">Abdallah Makhoul</a> at August 23, 2016 01:30 AM

Planning With Discrete Harmonic Potential Fields. (arXiv:1608.05931v1 [cs.RO])

In this work a discrete counterpart to the continuous harmonic potential field approach is suggested. The extension to the discrete case makes use of the strong relation HPF-based planning has to connectionist artificial intelligence (AI). Connectionist AI systems are networks of simple, interconnected processors running in parallel within the confines of the environment in which the planning action is to be synthesized. It is not hard to see that such a paradigm naturally lends itself to planning on weighted graphs where the processors may be seen as the vertices of the graph and the relations among them as its edges. Electrical networks are an effective realization of connectionist AI. The utility of the discrete HPF (DHPF) approach is demonstrated in three ways. First, the capability of the DHPF approach to generate new, abstract, planning techniques is demonstrated by constructing a novel, efficient, optimal, discrete planning method called the M* algorithm. Also, its ability to augment the capabilities of existing planners is demonstrated by suggesting a generic solution to the lower bound problem faced by the A* algorithm. The DHPF approach is shown to be useful in solving specific planning problems in communication. It is demonstrated that the discrete HPF paradigm can support routing on-the-fly while the network is still in a transient state. It is shown by simulation that if a path to the target always exist and the switching delays in the routers are negligible, a packet will reach its destination despite the changes in the network which may simultaneously take place while the packet is being routed.

by <a href="">Ahmad A. Masoud</a> at August 23, 2016 01:30 AM

FPGA Design for Pseudorandom Number Generator Based on Chaotic Iteration used in Information Hiding Application. (arXiv:1608.05930v1 [cs.CR])

Lots of researches indicate that the inefficient generation of random numbers is a significant bottleneck for information communication applications. Therefore, Field Programmable Gate Array (FPGA) is developed to process a scalable fixed-point method for random streams generation. In our previous researches, we have proposed a technique by applying some well-defined discrete chaotic iterations that satisfy the reputed Devaney's definition of chaos, namely chaotic iterations (CI). We have formerly proven that the generator with CI can provide qualified chaotic random numbers. In this paper, this generator based on chaotic iterations is optimally redesigned for FPGA device. By doing so, the generation rate can be largely improved. Analyses show that these hardware generators can also provide good statistical chaotic random bits and can be cryptographically secure too. An application in the information hiding security field is finally given as an illustrative example.

by <a href="">Jacques M. Bahi</a>, <a href="">Xiaole Fang</a>, <a href="">Christophe Guyeux</a>, <a href="">Laurent Larger</a> at August 23, 2016 01:30 AM

Quality Analysis of a Chaotic Proven Keyed Hash Function. (arXiv:1608.05928v1 [cs.CR])

Hash functions are cryptographic tools, which are notably involved in integrity checking and password storage. They are of primary importance to improve the security of exchanges through the Internet. However, as security flaws have been recently identified in the current standard in this domain, new ways to hash digital data must be investigated. In this document an original keyed hash function is evaluated. It is based on asynchronous iterations leading to functions that have been proven to be chaotic. It thus possesses various topological properties as uniformity and sensibility to its initial condition. These properties make our hash function satisfies established security requirements in this field. This claim is qualitatively proven and experimentally verified in this research work, among other things by realizing a large number of simulations.

by <a href="">Jacques M. Bahi</a>, <a href="">Jean-Fran&#xe7;ois Couchot</a>, <a href="">Christophe Guyeux</a> at August 23, 2016 01:30 AM

A Topological Study of Chaotic Iterations. Application to Hash Functions. (arXiv:1608.05920v1 [nlin.CD])

Chaotic iterations, a tool formerly used in distributed computing, has recently revealed various interesting properties of disorder leading to its use in the computer science security field. In this paper, a comprehensive study of its topological behavior is proposed. It is stated that, in addition to being chaotic as defined in the Devaney's formulation, this tool possesses the property of topological mixing. Additionally, its level of sensibility, expansivity, and topological entropy are evaluated. All of these properties lead to a complete unpredictable behavior for the chaotic iterations. As it only manipulates binary digits or integers, we show that it is possible to use it to produce truly chaotic computer programs. As an application example, a truly chaotic hash function is proposed in two versions. In the second version, an artificial neural network is used, which can be stated as chaotic according to Devaney.

by <a href="">Christophe Guyeux</a>, <a href="">Jacques M. Bahi</a> at August 23, 2016 01:30 AM

Self-Adaptive Trade-off Decision Making for Autoscaling Cloud-Based Services. (arXiv:1608.05917v1 [cs.DC])

Elasticity in the cloud is often achieved by on-demand autoscaling. In such context, the goal is to optimize the Quality of Service (QoS) and cost objectives for the cloud-based services. However, the difficulty lies in the facts that these objectives, e.g., throughput and cost, can be naturally conflicted, and the QoS of cloud-based services often interfere due to the shared infrastructure in cloud. Consequently, dynamic and effective trade-off decision making of autoscaling in the cloud is necessary, yet challenging. In particular, it is even harder to achieve well-compromised trade-offs, where the decision largely improves the majority of the objectives, while causing relatively small degradations to others. In this paper, we present a self-adaptive decision making approach for autoscaling in the cloud. It is capable to adaptively produce autoscaling decisions that lead to well-compromised trade-offs without heavy human intervention. We leverage on ant colony inspired multi-objective optimization for searching and optimizing the trade-offs decisions, the result is then filtered by compromise-dominance, a mechanism that extracts the decisions with balanced improvements in the trade-offs. We experimentally compare our approach to four state-of-the-arts autoscaling approaches: rule, heuristic, randomized and multi-objective genetic algorithm based solutions. The results reveal the effectiveness of our approach over the others, including better quality of trade-offs and significantly smaller violation of the requirements.

by <a href="">Tao Chen</a>, <a href="">Rami Bahsoon</a> at August 23, 2016 01:30 AM

Reducing State Explosion for Software Model Checking with Relaxed Memory Consistency Models. (arXiv:1608.05893v1 [cs.SE])

Software model checking suffers from the so-called state explosion problem, and relaxed memory consistency models even worsen this situation. What is worse, parameterizing model checking by memory consistency models, that is, to make the model checker as flexible as we can supply definitions of memory consistency models as an input, intensifies state explosion. This paper explores specific reasons for state explosion in model checking with multiple memory consistency models, provides some optimizations intended to mitigate the problem, and applies them to McSPIN, a model checker for memory consistency models that we are developing. The effects of the optimizations and the usefulness of McSPIN are demonstrated experimentally by verifying copying protocols of concurrent copying garbage collection algorithms. To the best of our knowledge, this is the first model checking of the concurrent copying protocols under relaxed memory consistency models.

by <a href="">Tatsuya Abe</a>, <a href="">Tomoharu Ugawa</a>, <a href="">Toshiyuki Maeda</a>, <a href="">Kousuke Matsumoto</a> at August 23, 2016 01:30 AM

Efficient non-anonymus composition operator for modeling complex dependable systems. (arXiv:1608.05874v1 [cs.DC])

A new model composer is proposed to automatically generate non-anonymus model replicas in the context of performability and dependability evaluation. It is a state-sharing composer that extends the standard anonymus replication composer in order to share the state of a replica among a set of other specific replicas or among the eplica and another external model. This new composition operator aims to improve expressiveness and performance with respect to the standard anonymus replicator, namely the one adopted by the M{\"o}bius modeling framework.

by <a href="">Silvano Chiaradonna</a>, <a href="">Felicita Di Giandomenico</a>, <a href="">Giulio Masetti</a> at August 23, 2016 01:30 AM

A Virtual Network PaaS for 3GPP 4G and Beyond Core Network Services. (arXiv:1608.05869v1 [cs.NI])

Cloud computing and Network Function Virtualization (NFV) are emerging as key technologies to overcome the challenges facing 4G and beyond mobile systems. Over the last few years, Platform-as-a-Service (PaaS) has gained momentum and has become more widely adopted throughout IT enterprises. It simplifies the applications provisioning and accelerates time-to-market while lowering costs. Telco can leverage the same model to provision the 4G and beyond core network services using NFV technology. However, many challenges have to be addressed, mainly due to the specificities of network services. This paper proposes an architecture for a Virtual Network Platform-as-a-Service (VNPaaS) to provision 3GPP 4G and beyond core network services in a distributed environment. As an illustrative use case, the proposed architecture is employed to provision the 3GPP Home Subscriber Server (HSS) as-a-Service (HSSaaS). The HSSaaS is built from Virtualized Network Functions (VNFs) resulting from a novel decomposition of HSS. A prototype is implemented and early measurements are made.

by <a href="">Mohammad Abu-Lebdeh</a>, <a href="">Sami Yangui</a>, <a href="">Diala Naboulsi</a>, <a href="">Roch Glitho</a>, <a href="">Constant Wette Tchouati</a> at August 23, 2016 01:30 AM

Simple realizability of complete abstract topological graphs simplified. (arXiv:1608.05867v1 [math.CO])

An abstract topological graph (briefly an AT-graph) is a pair $A=(G,\mathcal{X})$ where $G=(V,E)$ is a graph and $\mathcal{X}\subseteq {E \choose 2}$ is a set of pairs of its edges. The AT-graph $A$ is simply realizable if $G$ can be drawn in the plane so that each pair of edges from $\mathcal{X}$ crosses exactly once and no other pair crosses. We show that simply realizable complete AT-graphs are characterized by a finite set of forbidden AT-subgraphs, each with at most six vertices. This implies a straightforward polynomial algorithm for testing simple realizability of complete AT-graphs, which simplifies a previous algorithm by the author. We also show an analogous result for independent $\mathbb{Z}_2$-realizability, where only the parity of the number of crossings for each pair of independent edges is specified.

by <a href="">Jan Kyn&#x10d;l</a> at August 23, 2016 01:30 AM

AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version). (arXiv:1608.05866v1 [cs.DC])

Most distributed systems require coordination between all components involved. With the steady growth of such systems, the probability of failures increases, which necessitates fault-tolerant agreement protocols. The most common practical agreement protocol, for such scenarios, is leader-based atomic broadcast. In this work, we propose AllConcur, a distributed system that provides agreement through a leaderless concurrent atomic broadcast algorithm, thus, not suffering from the bottleneck of a central coordinator. In AllConcur, all components exchange messages concurrently through a logical overlay network that employs early termination to minimize the agreement latency. Our implementation of AllConcur supports standard sockets-based TCP as well as high-performance InfiniBand Verbs communications. AllConcur can handle up to 135 million requests per second and achieves 17x higher throughput than today's standard leader-based protocols, such as Libpaxos. Therefore, AllConcur not only offers significant improvements over existing solutions, but enables novel hitherto unattainable system designs in a variety of fields.

by <a href="">Marius Poke</a>, <a href="">Torsten Hoefler</a>, <a href="">Colin W. Glass</a> at August 23, 2016 01:30 AM

Steganalyzer performances in operational contexts. (arXiv:1608.05850v1 [cs.MM])

Steganography and steganalysis are two important branches of the information hiding field of research. Steganography methods consist in hiding information in such a way that the secret message is undetectable for the uninitiated. Steganalyzis encompasses all the techniques that attempt to detect the presence of such hidden information. This latter is usually designed by making classifiers able to separate innocent images from steganographied ones according to their differences on well-selected features. We wonder, in this article whether it is possible to construct a kind of universal steganalyzer without any knowledge regarding the steganographier side. The effects on the classification score of a modification of either parameters or methods between the learning and testing stages are then evaluated, while the possibility to improve the separation score by merging many methods during learning stage is deeper investigated.

by <a href="">Yousra A. Fadil</a>, <a href="">Jean-Fran&#xe7;ois Couchot</a>, <a href="">Rapha&#xeb;l Couturier</a>, <a href="">Christophe Guyeux</a> at August 23, 2016 01:30 AM

Resiliency in Distributed Sensor Networks for PHM of the Monitoring Targets. (arXiv:1608.05844v1 [cs.DC])

In condition-based maintenance, real-time observations are crucial for on-line health assessment. When the monitoring system is a wireless sensor network, data loss becomes highly probable and this affects the quality of the remaining useful life prediction. In this paper, we present a fully distributed algorithm that ensures fault tolerance and recovers data loss in wireless sensor networks. We first theoretically analyze the algorithm and give correctness proofs, then provide simulation results and show that the algorithm is (i) able to ensure data recovery with a low failure rate and (ii) preserves the overall energy for dense networks.

by <a href="">Jacques Bahi</a>, <a href="">Wiem Elghazel</a>, <a href="">Christophe Guyeux</a>, <a href="">Mohammed Haddad</a>, <a href="">Mourad Hakem</a>, <a href="">Kamal Medjaher</a>, <a href="">Nourredine Zerhouni</a> at August 23, 2016 01:30 AM

Computation Offloading Decisions for Reducing Completion Time. (arXiv:1608.05839v1 [cs.DC])

We analyze the conditions in which offloading computation reduces completion time. We extend the existing literature by deriving an inequality (Eq. 4) that relates computation offloading system parameters to the bits per instruction ratio of a computational job. This ratio is the inverse of the arithmetic intensity. We then discuss how this inequality can be used to determine the computations that can benefit from offloading as well as the computation offloading systems required to make offloading beneficial for particular computations.

by <a href="">Salvador Melendez</a>, <a href="">Michael P. McGarry</a> at August 23, 2016 01:30 AM

Proving chaotic behaviour of CBC mode of operation. (arXiv:1608.05838v1 [cs.CR])

The cipher block chaining (CBC) block cipher mode of operation was invented by IBM (International Business Machine) in 1976. It presents a very popular way of encrypting which is used in various applications. In this paper, we have mathematically proven that, under some conditions, the CBC mode of operation can admit a chaotic behaviour according to Devaney. Some cases will be properly studied in order to put in evidence this idea.

by <a href="">Abdessalem Abidi</a>, <a href="">Qianxue Wang</a>, <a href="">Belgacem Bouallegue</a>, <a href="">Mohsen Machhout</a>, <a href="">Christophe Guyeux</a> at August 23, 2016 01:30 AM

DEBH: Detection and Elimination Black Holes in Mobile Ad Hoc Network. (arXiv:1608.05830v1 [cs.NI])

Mobile Ad hoc Network MANET is a self-configurable, easy to setup and decentralize network with mobile wireless nodes. Special features of MANET like hop-by-hop communications, dynamic topology and open network boundary made security highly challengeable. Securing routing protocols against malicious nodes is one the most critical issues in MANETs security. In this paper, a novel approach called Detection and Elimination Black Holes DEBH is proposed, which uses a data control packet and an additional Black hole Check BCh table for detecting and eliminating malicious nodes. Three different types of black holes are defined and DEBH is studied for them all. Simulation results show that DEBH increases network throughput and decreases packet overhead and delay in comparison with other studied approaches. Moreover, DEBH is able to detect all active malicious nodes which generates fault routing information.

by <a href="">Ali Dorri</a> at August 23, 2016 01:30 AM

Analysis of Bayesian Classification based Approaches for Android Malware Detection. (arXiv:1608.05812v1 [cs.CR])

Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, in this paper we develop and analyze proactive Machine Learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification based solutions for detecting unknown Android malware.

by <a href="">Suleiman Y. Yerima</a>, <a href="">Sakir Sezer</a>, <a href="">Gavin McWilliams</a> at August 23, 2016 01:30 AM

Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs. (arXiv:1608.05794v1 [physics.comp-ph])

Efficient ordinary differential equation solvers for chemical kinetics must take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver are implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms with OpenCL. The performances of these parallel implementations were measured with three chemical kinetic models across several multicore and many-core platforms. Two runtime benchmarks were conducted to clearly determine any performance advantage offered by either method: evaluating the right-hand-side source terms in parallel, and integrating a series of constant-pressure homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded code. The SIMT model on the host and Phi was 13-35% slower than the baseline while the SIMT model on the GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased 2.5-2.7x with the SIMD implementations on the host CPU and 4.7-4.9x with the Xeon Phi coprocessor compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.4-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi.

by <a href="">Christopher P. Stone</a>, <a href="">Kyle E. Niemeyer</a> at August 23, 2016 01:30 AM

On Formal Verification in Imperative Multivalued Programming over Continuous Data Types. (arXiv:1608.05787v1 [cs.NA])

Using fundamental ideas from [Brattka&Hertling'98] and by means of object-oriented overloading of operators, the iRRAM library supports imperative programming over the reals with a both sound and computable, multivalued semantics of tests. We extend Floyd-Hoare Logic to formally verify the correctness of symbolic-numerical algorithms employing such primitives for three example problems: truncated binary logarithm, 1D simple root finding, and solving systems of linear equations. This is to be generalized to other hybrid (i.e. discrete and continuous) abstract data types.

by <a href="">Norbert M&#xfc;ller</a>, <a href="">Sewon Park</a>, <a href="">Norbert Preining</a>, <a href="">Martin Ziegler</a> at August 23, 2016 01:30 AM

Non-Orthogonal Multiple Access (NOMA) in Cellular Uplink and Downlink: Challenges and Enabling Techniques. (arXiv:1608.05783v1 [cs.NI])

By combining the concepts of superposition coding at the transmitter(s) and successive interference cancellation (SIC) at the receiver(s), non-orthogonal multiple access (NOMA) has recently emerged as a promising multiple access technique for 5G wireless technology. In this article, we first discuss the fundamentals of uplink and downlink NOMA transmissions and outline their key distinctions (in terms of implementation complexity, detection and decoding at the SIC receiver(s), incurred intra-cell and inter-cell interferences). Later, for both downlink and uplink NOMA, we theoretically derive the NOMA dominant condition for each individual user in a two-user NOMA cluster. NOMA dominant condition refers to the condition under which the spectral efficiency gains of NOMA are guaranteed compared to conventional orthogonal multiple access (OMA). The derived conditions provide direct insights on selecting appropriate users in two-user NOMA clusters. The conditions are distinct for uplink and downlink as well as for each individual user. Numerical results show the significance of the derived conditions for the user selection in uplink/downlink NOMA clusters and provide a comparison to the random user selection. A brief overview of the recent research investigations is then provided to highlight the existing research gaps. Finally, we discuss the potential applications and key challenges of NOMA transmissions.

by <a href="">Hina Tabassum</a>, <a href="">Md Shipon Ali</a>, <a href="">Ekram Hossain</a>, <a href="">Md. Jahangir Hossain</a>, <a href="">Dong In Kim</a> at August 23, 2016 01:30 AM

On Nonconvex Decentralized Gradient Descent. (arXiv:1608.05766v1 [math.OC])

Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been proposed for {convex} consensus optimization. However, on \emph{nonconvex} consensus optimization, our understanding to the behavior of these algorithms is limited.

This note first analyzes the convergence of the algorithm Decentralized Gradient Descent (DGD) applied to a consensus optimization problem with a smooth, possibly nonconvex objective function. We use a fixed step size under a proper bound and establish that the DGD iterates converge to a stationary point of a Lyapunov function, which approximates one of the original problem. The difference between each local point and their global average is subject to a bound proportional to the step size.

This note then establishes similar results for the algorithm Prox-DGD, which is designed to minimize the sum of a differentiable function and a proximable function. While both functions can be nonconvex, a larger fixed step size is allowed if the proximable function is convex.

by <a href="">Jinshan Zeng</a>, <a href="">Wotao Yin</a> at August 23, 2016 01:30 AM

Inference in Probabilistic Logic Programs using Lifted Explanations. (arXiv:1608.05763v1 [cs.AI])

In this paper, we consider the problem of lifted inference in the context of Prism-like probabilistic logic programming languages. Traditional inference in such languages involves the construction of an explanation graph for the query and computing probabilities over this graph. When evaluating queries over probabilistic logic programs with a large number of instances of random variables, traditional methods treat each instance separately. For many programs and queries, we observe that explanations can be summarized into substantially more compact structures, which we call lifted explanation graphs. In this paper, we define lifted explanation graphs and operations over them. In contrast to existing lifted inference techniques, our method for constructing lifted explanations naturally generalizes existing methods for constructing explanation graphs. To compute probability of query answers, we solve recurrences generated from the lifted graphs. We show examples where the use of our technique reduces the asymptotic complexity of inference.

by <a href="">Arun Nampally</a>, <a href="">C. R. Ramakrishnan</a> at August 23, 2016 01:30 AM

Supermodularity in Unweighted Graph Optimization II: Matroidal Term Rank Augmentation. (arXiv:1608.05730v1 [math.CO])

Ryser's max term rank formula with graph theoretic terminology is equivalent to a characterization of degree sequences of simple bipartite graphs with matching number at least $\ell$. In a previous paper by the authors, a generalization was developed for the case when the degrees are constrained by upper and lower bounds. Here two other extensions of Ryser's theorem are discussed. The first one is a matroidal model, while the second one settles the augmentation version. In fact, the two directions shall be integrated into one single framework.

by <a href="">Krist&#xf3;f B&#xe9;rczi</a>, <a href="">Andr&#xe1;s Frank</a> at August 23, 2016 01:30 AM

Supermodularity in Unweighted Graph Opitimization III: Highly-connected Digraphs. (arXiv:1608.05729v1 [math.CO])

By generalizing a recent result of Hong, Liu, and Lai on characterizing the degree-sequences of simple strongly connected directed graphs, a characterization is provided for degree-sequences of simple $k$-node-connected digraphs. More generally, we solve the directed node-connectivity augmentation problem when the augmented digraph is degree-specified and simple. As for edge-connectivity augmentation, we solve the special case when the edge-connectivity is to be increased by one and the augmenting digraph must be simple.

by <a href="">Krist&#xf3;f B&#xe9;rczi</a>, <a href="">Andr&#xe1;s Frank</a> at August 23, 2016 01:30 AM

Supermodularity in Unweighted Graph Optimization I: Branchings and Matchings. (arXiv:1608.05722v1 [math.CO])

The main result of the paper is motivated by the following two, apparently unrelated graph optimization problems: (A) as an extension of Edmonds' disjoint branchings theorem, characterize digraphs comprising $k$ disjoint branchings $B_i$ each having a specified number $\mu _i$ of arcs, (B) as an extension of Ryser's maximum term rank formula, determine the largest possible matching number of simple bipartite graphs complying with degree-constraints. The solutions to these problems and to their generalizations will be obtained from a new min-max theorem on covering a supermodular function by a simple degree-constrained bipartite graph. A specific feature of the result is that its minimum cost extension is already {\bf NP}-complete. Therefore classic polyhedral tools themselves definitely cannot be sufficient for solving the problem, even though they make some good service in our approach.

by <a href="">Krist&#xf3;f B&#xe9;rczi</a>, <a href="">Andr&#xe1;s Frank</a> at August 23, 2016 01:30 AM

On the queue-number of graphs with bounded tree-width

Authors: Veit Wiechert
Download: PDF
Abstract: A queue layout of a graph consists of a linear order on the vertices and an assignment of the edges to queues, such that no two edges in a single queue are nested. The minimum number of queues needed in a queue layout of a graph is called its queue-number.

We show that for each $k\geq1$, graphs with tree-width at most $k$ have queue-number at most $2^k-1$. This improves upon double exponential upper bounds due to Dujmovi\'c et al. and Giacomo et al. As a consequence we obtain that these graphs have track-number at most $2^{O(k^2)}$.

We complement these results by a construction of $k$-trees that have queue-number at least $k+1$. Already in the case $k=2$ this is an improvement to existing results and solves a problem of Rengarajan and Veni Madhavan, namely, that the maximal queue-number of $2$-trees is equal to $3$.

August 23, 2016 01:13 AM

Computing Zigzag Persistent Cohomology

Authors: Clément Maria, Steve Oudot
Download: PDF
Abstract: Zigzag persistent homology is a powerful generalisation of persistent homology that allows one not only to compute persistence diagrams with less noise and using less memory, but also to use persistence in new fields of application. However, due to the increase in complexity of the algebraic treatment of the theory, most algorithmic results in the field have remained of theoretical nature.

This article describes an efficient algorithm to compute zigzag persistence, emphasising on its practical interest. The algorithm is a zigzag persistent cohomology algorithm, based on the dualisation of reflections and transpositions transformations within the zigzag sequence.

We provide an extensive experimental study of the algorithm. We study the algorithm along two directions. First, we compare its performance with zigzag persistent homology algorithm and show the interest of cohomology in zigzag persistence. Second, we illustrate the interest of zigzag persistence in topological data analysis by comparing it to state of the art methods in the field, specifically optimised algorithm for standard persistent homology and sparse filtrations. We compare the memory and time complexities of the different algorithms, as well as the quality of the output persistence diagrams.

August 23, 2016 01:13 AM

Beckett-Gray Codes

Authors: Mark Cooke, Chris North, Megan Dewar, Brett Stevens
Download: PDF
Abstract: In this paper we discuss a natural mathematical structure that is derived from Samuel Beckett's play "Quad". This structure is called a binary Beckett-Gray code. Our goal is to formalize the definition of a binary Beckett-Gray code and to present the work done to date. In addition, we describe the methodology used to obtain enumeration results for binary Beckett-Gray codes of order $n = 6$ and existence results for binary Beckett-Gray codes of orders $n = 7,8$. We include an estimate, using Knuth's method, for the size of the exhaustive search tree for $n=7$. Beckett-Gray codes can be realized as successive states of a queue data structure. We show that the binary reflected Gray code can be realized as successive states of two stack data structures.

August 23, 2016 01:12 AM

Computing Maximum Flow with Augmenting Electrical Flows

Authors: Aleksander Madry
Download: PDF
Abstract: We present an $\tilde{O}\left(m^{\frac{10}{7}}U^{\frac{1}{7}}\right)$-time algorithm for the maximum $s$-$t$ flow problem and the minimum $s$-$t$ cut problem in directed graphs with $m$ arcs and largest integer capacity $U$. This matches the running time of the $\tilde{O}\left((mU)^{\frac{10}{7}}\right)$-time algorithm of M\k{a}dry (FOCS 2013) in the unit-capacity case, and improves over it, as well as over the $\tilde{O}\left(m \sqrt{n} \log U\right)$-time algorithm of Lee and Sidford (FOCS 2014), whenever $U$ is moderately large and the graph is sufficiently sparse. By well-known reductions, this also gives similar running time improvements for the maximum-cardinality bipartite $b$-matching problem.

One of the advantages of our algorithm is that it is significantly simpler than the ones presented in Madry (FOCS 2013) and Lee and Sidford (FOCS 2014). In particular, these algorithms employ a sophisticated interior-point method framework, while our algorithm is cast directly in the classic augmenting path setting that almost all the combinatorial maximum flow algorithms use. At a high level, the presented algorithm takes a primal dual approach in which each iteration uses electrical flows computations both to find an augmenting $s$-$t$ flow in the current residual graph and to update the dual solution. We show that by maintain certain careful coupling of these primal and dual solutions we are always guaranteed to make significant progress.

August 23, 2016 01:12 AM

The Random Access Zipper: Simple, Purely-Functional Sequences

Authors: Kyle Headley, Matthew A. Hammer
Download: PDF
Abstract: We introduce the Random Access Zipper (RAZ), a simple, purely-functional data structure for editable sequences. A RAZ combines the structure of a zipper with that of a tree: like a zipper, edits at the cursor require constant time; by leveraging tree structure, relocating the edit cursor in the sequence requires logarithmic time. While existing data structures provide these time bounds, none do so with the same simplicity and brevity of code as the RAZ. The simplicity of the RAZ provides the opportunity for more programmers to extend the structure to their own needs, and we provide some suggestions for how to do so.

August 23, 2016 01:11 AM

Squares of Low Maximum Degree

Authors: Manfred Cochefert, Jean-François Couturier, Petr A. Golovach, Daniël Paulusma, Anthony Stewart
Download: PDF
Abstract: A graph H is a square root of a graph G if G can be obtained from H by adding an edge between any two vertices in H that are of distance 2. The Square Root problem is that of deciding whether a given graph admits a square root. This problem is only known to be NP-complete for chordal graphs and polynomial-time solvable for non-trivial minor-closed graph classes and a very limited number of other graph classes. We prove that Square Root is O(n)-time solvable for graphs of maximum degree 5 and O(n^4)-time solvable for graphs of maximum degree at most 6.

August 23, 2016 01:07 AM

A Linear Kernel for Finding Square Roots of Almost Planar Graphs

Authors: Petr A. Golovach, Dieter Kratsch, Daniël Paulusma, Anthony Stewart
Download: PDF
Abstract: A graph H is a square root of a graph G if G can be obtained from H by the addition of edges between any two vertices in H that are of distance 2 from each other. The Square Root problem is that of deciding whether a given graph admits a square root. We consider this problem for planar graphs in the context of the "distance from triviality" framework. For an integer k, a planar+kv graph (or k-apex graph) is a graph that can be made planar by the removal of at most k vertices. We prove that a generalization of Square Root, in which some edges are prescribed to be either in or out of any solution, has a kernel of size O(k) for planar+kv graphs, when parameterized by k. Our result is based on a new edge reduction rule which, as we shall also show, has a wider applicability for the Square Root problem.

August 23, 2016 01:01 AM

Linear Kernels for Separating a Graph into Components of Bounded Size

Authors: Mingyu Xiao
Download: PDF
Abstract: Graph separation and partitioning are fundamental problems that have been extensively studied both in theory and practice. The \textsc{$p$-Size Separator} problem, closely related to the \textsc{Balanced Separator} problem, is to check whether we can delete at most $k$ vertices in a given graph $G$ such that each connected component of the remaining graph has at most $p$ vertices. This problem is NP-hard for each fixed integer $p\geq 1$ and it becomes the famous \textsc{Vertex Cover} problem when $p=1$. It is known that the problem with parameter $k$ is W[1]-hard for unfixed $p$. In this paper, we prove a kernel of $O(pk)$ vertices for this problem, i.e., a linear vertex kernel for each fixed $p \geq 1$. In fact, we first obtain an $O(p^2k)$ vertex kernel by using a nontrivial extension of the expansion lemma. Then we further reduce the kernel size to $O(pk)$ by using some `local adjustment' techniques. Our proofs are based on extremal combinatorial arguments and the main result can be regarded as a generalization of the Nemhauser and Trotter's theorem for the \textsc{Vertex Cover} problem. These techniques are possible to be used to improve kernel sizes for more problems, especially problems with kernelization algorithms based on techniques similar to the expansion lemma or crown decompositions.

August 23, 2016 01:01 AM

Low Algorithmic Complexity Entropy-deceiving Graphs

Authors: Hector Zenil, Narsis Kiani
Download: PDF
Abstract: A common practice in the estimation of the complexity of objects, in particular of graphs, is to rely on graph- and information-theoretic measures. Here, using integer sequences with properties such as Borel normality, we explain how these measures are not independent of the way in which a single object, such a graph, can be described. From descriptions that can reconstruct the same graph and are therefore essentially translations of the same description, we will see that not only is it necessary to pre-select a feature of interest where there is one when applying a computable measure such as Shannon Entropy, and to make an arbitrary selection where there is not, but that more general properties, such as the causal likeliness of a graph as a measure (opposed to randomness), can be largely misrepresented by computable measures such as Entropy and Entropy rate. We introduce recursive and non-recursive (uncomputable) graphs and graph constructions based on integer sequences, whose different lossless descriptions have disparate Entropy values, thereby enabling the study and exploration of a measure's range of applications and demonstrating the weaknesses of computable measures of complexity.

August 23, 2016 01:01 AM

On the orthogonal rank of Cayley graphs and impossibility of quantum round elimination

Authors: Jop Briët, Jeroen Zuiddam
Download: PDF
Abstract: After Bob sends Alice a bit, she responds with a lengthy reply. At the cost of a factor of two in the total communication, Alice could just as well have given the two possible replies without listening and have Bob select which applies to him. Motivated by a conjecture stating that this form of "round elimination" is impossible in exact quantum communication complexity, we study the orthogonal rank and a symmetric variant thereof for a certain family of Cayley graphs. The orthogonal rank of a graph is the smallest number $d$ for which one can label each vertex with a nonzero $d$-dimensional complex vector such that adjacent vertices receive orthogonal vectors.

We show an exp$(n)$ lower bound on the orthogonal rank of the graph on $\{0,1\}^n$ in which two strings are adjacent if they have Hamming distance at least $n/2$. In combination with previous work, this implies an affirmative answer to the above conjecture.

August 23, 2016 01:00 AM



SNI support added to libtls, httpd in -current

Joel Sing (jsing@) has added server-side Server Name Indication (SNI) support to libtls and, based on that, to httpd.


August 23, 2016 12:31 AM



Machine learning in skewed data

I am training a neural network on an emotion recognition dataset for five classes of emotion. I some problems:

  1. The dataset set is skewed: class1 has 100 observations, but class2 has 3000.

    I tried to use Smote to balance the data.

  2. The training data is not similar to the testing data; that is a big problem.

I don't know how to solve problem 2; please help me.

by 吳東翰 at August 23, 2016 12:16 AM


Fermi level or Fermi energy?

I searched almost through the whole internet resulting definition of Fermi level and Fermi energy used in semiconductors. There are so many definitions and I still don't know which is the best to be understood. I really want to understand this Fermi "things" because they have to do much with semiconductors and I really want to understand semiconductors so I can better understand transistors.

by Lu Ka at August 23, 2016 12:05 AM


Hedging, Delta, Gamma, Vega

I sometimes find it difficult to see, how to hedge a portfolio.

Let say, that I created a product consisting of an Asian call (strike 1), Vanilla call (strike 2), and an Asian Put (strike 1) on a stock called ABC. Now let say the the delta of the total product is 60%, Gamma is 1,5% and Vega is 1,5.

Now If I SHORT this "product", then I can delta-hedge the portfolio by going LONG in the underlying (Stock ABC) by 0,70 for one product I sell. I think this is correct?

But what about the gamma and the vega?

So I can gamma-hedge as well, but here I cannot just by/sell the underlying. I need an option on the underlying? ANd this option need to have a gamma of 1,5, but do I need to buy or sell the option??

And waht about Vega?

I hope you guys can help me! Thanks,

by Vinter at August 23, 2016 12:04 AM

HN Daily

August 22, 2016


A Turing Machine that exclusively accepts an infinite string

While reading some of proofs in Computability Theory, I came up with following conclusion:

We can design a Turing Machine which exclusively accepts finite strings (obvious).

Now while trying the same for infinite string, I came up with the following conclusion :

There exist no Turing Machine which exclusively accepts infinite strings.

Proof : Since acceptance in context to Turing Machine is defined in terms of finite amount of time, therefore every time we say that an infinite string $w$ is accepted by a Turing Machine, we usually talk about searching a "finite pattern" in $w$. So while designing a machine for $w$ there exist a $w'$ which is finite and contains the same "finite pattern" which we try to find in $w$.

Example of "finite pattern" : A string that contains a 0. (Assuming Binary Strings)

So my question is that is my proof right? and if yes, is there a better way to prove this?

by Lashit Jain at August 22, 2016 11:58 PM


Building a Really, Really Small Android App

An exploration on where bloat in an Android app might live.

I wish we had a performance tag for this sort of thing.


by angersock at August 22, 2016 11:34 PM


what method/classifier should I use for a training set with lots of attributes but few examples

each training example has 100 numeric attributes plus one output class and about 80% of the attributes are 'zero' (means no data collected). And the value of attributes varies in a small range, like (-20,20). I have 100 examples like this. What method/classifier should I use? I tried KNN, Naive Bayes, SVM, random forest/tree, none of these methods give me accuracy above 50% (I used 10-fold cross validation). what should I do?

by tikael at August 22, 2016 11:24 PM


Mplayer garbling top 30 pixels in full screen

On FreeBSD 10.3 switching mplayer (MPlayer SVN-r37862-snapshot-3.4.1) to full screen when watching a video with 1920 × 1080 resolution, mplayer garbles the first top ~30 pixel-rows. For example, it looks like this. The top 30 pixel-rows are remnants from the frame directly seen before going to full screen mode.

Luckily mplayer behaves normal if started in full screen mode, like mplayer -fs file.avi, otherwise it would be completely unusable. But it is still very annoying problem. Do you have any ideas?

by wolf-revo-cats at August 22, 2016 10:36 PM


How to use `Dirichlet Process Gaussian Mixture Model` in Scikit-learn? (n_components?)

My understanding of "an infinite mixture model with the Dirichlet Process as a prior distribution on the number of clusters" is that the number of clusters is determined by the data as they converge to a certain amount of clusters.

This R Implementation decides on the number of clusters in this way. Although, the R implementation uses a Gibbs sampler, I'm not sure if that affects this.

What confuses me is the n_components parameters. n_components: int, default 1 : Number of mixture components. If the number of components is determined by the data and the Dirichlet Process, then what is this parameter?

Ultimately, I'm trying to get:

(1) the cluster assignment for each sample;

(2) the probability vectors for each cluster; and

(3) the likelihood/log-likelihood for each sample.

It looks like (1) is the predict method, and (3) is the score method. However, the output of (1) is completely dependent on the n_components hyperparameter.

My apologies if this is a naive question, I'm very new to Bayesian programming and noticed there was Dirichlet Process in Scikit-learn that I wanted to try out.

Here's the docs:

Here's an example of usage:

Here's my naive usage:

from sklearn.mixture import DPGMM
X = pd.read_table("Data/processed/data.tsv", sep="\t", index_col=0)
Mod_dpgmm = DPGMM(n_components=3)

by O.rka at August 22, 2016 10:25 PM



make don't know how to make CXXFLAGS. Stop

I am very new to both FreeBSD and compiling code from source and would really appreciate any help. I am trying to compile fastText from source. When I execute the make command it returns the following message:

make don't know how to make CXXFLAGS. Stop

Here are first few lines from Makefile(complete file is available on the fastText github repo mentioned above):

CXX = c++
CXXFLAGS = -pthread -std=c++0x
OBJS = args.o dictionary.o matrix.o vector.o model.o utils.o

opt: CXXFLAGS += -O3 -funroll-loops
opt: fasttext

debug: CXXFLAGS += -g -O0 -fno-inline
debug: fasttext

FreeBSD version: 10.3
FreeBSD clang version: 3.4.1
gmake version: 4.1_2

by Imran Ali at August 22, 2016 10:20 PM

Why do some usernames on FreeBSD start with an underscore?

Some usernames on FreeBSD start with an underscore:

_dhcp:*:65:65:dhcp programs:/var/empty:/usr/sbin/nologin

but others do not:

www:*:80:80:World Wide Web Owner:/nonexistent:/usr/sbin/nologin

What's the significance of this underscore? Is it purely historical or does it serve a practical purpose?

Some more examples can be seen in the FreeBSD ports/UIDs file.

by Joe Harrison at August 22, 2016 10:13 PM


Proof of optimal exercise time theorem for American derivative security in N-period binomial asset-pricing model

At least two textbooks (Shreve's Stochastic Calculus for Finance - I, theorem 4.4.5 or Campolieti & Makarov's Financial Mathematics, proposition 7.8) prove the optimal exercise theorem that says that the stopping time $ \tau^* = min \{n; V_n = G_n\}$ maximizes $$ V_n = \max_{\tau \in S_n} \tilde{\mathrm{E}}\Big[\mathrm{I}_{\tau \leq N}\frac{1}{(1+r)^{\tau-n}}G_{\tau}\Big] \qquad (1) $$ by demonstrating that stopped process $ \frac{1}{(1+r)^{n \wedge \tau^*}}V_{n \wedge \tau^*}$ is a martingale under the risk-neutral probability measure.

But how can someone conclude from this fact that $\tau^*$ is actually maximizing $(1)$?

by zer0hedge at August 22, 2016 10:02 PM


Alize LIA_RAL installation [on hold]

I managed to install Alize and now when I try to install LIA_RAL I'm getting errors.

I'm on VM Ubuntu 16.04

The errors ocurre when I hit the ./configure and make

enter image description here

by salama2121 at August 22, 2016 09:54 PM


The Exceptional Beauty of Doom 3's Source Code

This might be the only time I’ll ever submit a Kotaku article here.

It’s kinda interesting seeing somebody else’s take on that.


by angersock at August 22, 2016 08:50 PM


Pricing a Vanilla swap between coupons; What rates to use?

Vanilla Swap question. Entered into a 5Y fixed for floating HUF swap. Fixed is annual coupons, Float is semi-annual coupons.

1 month later I want to price it. I set up my future values for Fixed coupons for the next 5Y and notional at the end, and my next [coupon + notional] for Float (the coupon is now in 5 months, and a Floating rate is valued at par right after it pays its coupon).

I have the BUBOR rates. For my discount factors for my PV, do I use straight line interpolation of the rates? Or use the next interest rate? For example, with .39Y to go before the floating rate coupon, do I use the 0.5Y rate, the .25Y rate, or the interpolated (weighted average of rate and time) of both?

Also under continuous compounding (e), since my Fixed leg is ACT/365 and BUBOR is ACT/360, do I have to multiply the BUBOR rate by (365/360) before getting my discount rate to make it equivalent?

by Seroexcel at August 22, 2016 08:35 PM


Planet Theory

Chrisitan Comment on the Jesus Wife Thing misses the important point

In 2012 a Professor of Divisinity at Harvard, Karen King, announced that she had a fragment that seemed to indicate that Jesus had a wife. It was later found to be fake.  The article that really showed it was a fake was in the Atlantic monthly here.  A Christian Publication called Breakpoint  told the story: here.

When I read a story about person X being proven wrong the question upper most in my mind is: how did X react?  If they retract then they still have my respect and can keep on doing whatever work they were doing. If they dig in their heels and insist they are still right, or that a minor fix will make the proof correct (more common in our area than in history) then they lose all my respect.

The tenth paragraph has the following:

Within days of the article’s publication, King admitted that the fragment is probably a forgery. Even more damaging, she told Sabar that “I haven’t engaged the provenance questions at all” and that she was “not particularly” interested in what he had discovered.

Dr. King should have been more careful and more curious (though hindsight is wonderful)  initially. However, her admitting it was probably a forgery (probably?) is ... okay. I wish she was more definite in her admission but... I've seen far worse.

A good scholar will admit when they are wrong. A good scholar will look at the evidence and be prepared to change their minds.

Does Breakpoint itself do this when discussing homosexuality or evolution or global warming. I leave that to the reader.

However, my major point is that the difference between a serious scientist and a crank is what one does when confronted with evidence that you are wrong.

by GASARCH ( at August 22, 2016 08:25 PM



Python (scikit learn) lda collapsing to single dimension

I'm very new to scikit learn and machine learning in general.

I am currently designing a SVM to predict if a specific amino acid sequence will be cut by a protease. So far the the SVM method seems to be working quite well: sensitivity and specificity of one of my SVM models

I'd like to visualize the distance between the two categories (cut and uncut), so I'm trying to use the linear discrimination analysis, which is similar to the principal component analysis, using the following code:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 =, targs).transform(DATA)
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
    plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.title('LDA of cleavage_site dataset')

However, the LDA is only giving a 1D result

In: print X_r2[:5]
Out: [[ 6.74369996]
 [ 4.14254941]
 [ 5.19537896]
 [ 7.00884032]
 [ 3.54707676]]

enter image description here

However, the pca analysis will give 2 dimensions with the data I am inputting:

pca = PCA(n_components=2)
X_r =
print X_r[:5]
Out: [[ 0.05474151  0.38401203]
 [ 0.39244191  0.74113729]
 [-0.56785236 -0.30109694]
 [-0.55633116 -0.30267444]
 [ 0.41311866 -0.25501662]]

edit: here is a link to two google-docs with the input data. I am not using the sequence information, just the numerical information that follows. The files are split up between positive and negative control data. Input data: file1 file2

by lstbl at August 22, 2016 08:03 PM

Planet Emacsen

Ben Simon: Well Duh: a more intelligent emacs file opening strategy

Last week I finally modernized my PHP emacs setup. I did so by selecting two powerful modes (php-mode and web-mode) and implementing a bit of code to easily toggle between the two. I included this comment in my blog post:

At some point, I could codify this so that files in the snippets directory, for example, always open in web-mode, whereas files loaded from under lib start off in php-mode.

When I wrote the above statement I assumed that I'd need to dust off the o'l emacs lisp manual and would need to write some code to analyze the directory of the file being opened. Turns out, I was vastly over-thinking this.

The standard way to associate a mode with a file is by using the elisp variable auto-mode-alist. This is Emacs 101 stuff, and is something I've been doing for 20+ years. In my emacs config file I had this line:

(add-to-list 'auto-mode-alist '("[.]php$" . php-mode))

Which says too open .php files in php-mode. What I'd never done, nor considered, is that you don't have to limit yourself to matching the base filename. The auto-mode-alist is matched against the entire path. To open up ‘snippet’ files in web-mode is trivial. I just put the above code in my .emacs file:

(add-to-list 'auto-mode-alist '("[.]php$" . php-mode))
(add-to-list 'auto-mode-alist 
   '("\\(pages\\|snippets\\|templates\\)/.*[.]php?$" . web-mode))

The order is key here. add-to-list pushes new items to the front of the list. So the first line adds a general rule to open up all .php files in php-mode, and the second line adds a specific rule: if the full path to the file contains the the word pages or snippets or templates, then open the file in web-mode. It's not perfect, but files matching this path convention are far more likely to be in the right mode for me.

While I'm a bonehead for not seeing this sooner, I sure do appreciate trivial solutions.

by Ben Simon ( at August 22, 2016 08:00 PM


Java 8 unbound reference syntax struggle

I'm trying to create a method that puts a Function's results in to a Consumer you using unbound references (I think). Here's the scenario. With JDBC's ResultSet you can get row values by index. I have a Bean instance I want to place selected values into. I'm looking for a way to avoid writing boiler plate mapping code but instead achieve something like:

static <T> void copy(Consumer<T> setter, Function<T, Integer> getter, Integer i);

And call it like:

copy(Bean::setAValue, ResultSet::getString, 0)

I don't want the bind Bean and ResultSet to instance too early since I want this to be usable with any bean of ResultSet.

The example I've been trying to work from is:

public static <T> void println(Function<T,String> function, T value) {

Called via:

println(Object::toString, 0L);

by nwillc at August 22, 2016 07:27 PM



tensorflow rnn model path

I have trained the language model using Tensorflow as given in thie tutorial

For training I used the following command.

 bazel-bin/tensorflow/models/rnn/ptb/ptb_word_lm   --data_path=./simple-examples/data/  --model small

The training was successful with the following o/p at the end.

Epoch: 13 Train Perplexity: 37.196
Epoch: 13 Valid Perplexity: 124.502
Test Perplexity: 118.624

But I am still confused with, where the training model is stored and how to use it.

by src79 at August 22, 2016 07:09 PM

Combining K Means with anomaly detection via normal distribution

I have some questions concerning Machine Learning and anomaly detection.

My task is to detect anomalies in the big dataset of variables. First I did extract some features - both continous and boolean. Next I perform scaling (normalization = (x-μ)/σ) following by KMeans clustering.

Next I would like to focus on anomalies in the big clusters ( I assume those observations that fall far away from the centers in the big clusters could also be treated as anomalies). Following the tips from Coursera course taught by Andrew Ng I would like to use normal distrbution to do that:

For each big cluster:

For each big cluster:

1.Find the parameters: μ_i and σ2_i for each feature x_i.

  1. Fit normal distribution to each feature x_i to compute probabilities.

  2. Compute the product of probabilities of each feature x_i (I assume features are indep.,in this step I compute the probability of the observation x.

  3. Find those observations that distribution is smaller that ϵ (0.05)

Detailed info is here:

For small cluster (up tp 50 obs) - I treat all of the observations as anomalies.

Unfortunately my results were not satisfactory till now. I ended up with probability equal to 1200!!! for some x and 0 for other.

So that is why I am asking for your help. Maybe someone will point me what am I doing wrong?

1.In general, the idea of combining clustering and anomaly detection via normal distribution sounds ok?

2.How to deal with boolean variables or variables which take on integer values in a small range (for example 1,2,5,8,9)

3.The extremly high results of probability might be caused by scaling of variables? For some features (for some clusters) I had obtained μ = 0.000006 and σ2=0.0000008. I found on Wikipedia the following paragraph: "In the limit when σ tends to zero, the probability density f(x) eventually tends to zero at any x ≠ μ, but grows without limit if x = μ". It seems, like its a problem that I am facing. How to deal with it?

Thanks for any help!

by Lost in ML at August 22, 2016 06:58 PM



How can I get my array to only be manipulated locally (within a function) in Ruby?

Why is my array globally manipulated, when I run the below ruby code? And how can I get arrays to be manipulated only within the function's scope?

a = [[1,0],[1,1]]

def iterate(array) { |row|
    return true if row.keep_if{|i| i != 1 } == []

puts a.to_s
puts iterate(a)
puts a.to_s

$ ruby test.rb output:

[[1, 0], [1, 1]]
[[0], []]

I can't get it to work. I've even tried .select{true} and assign it to a new name. How does the scope work in Ruby for Arrays? Just for reference, $ ruby -v:

ruby 2.2.1p85 (2015-02-26 revision 49769) [x86_64-linux]

by supercuteboy at August 22, 2016 06:21 PM


What are you working on this week?

This is the weekly thread to discuss what you have done recently and are working on this week.

Please be descriptive and don’t hesitate to champion your accomplishments or ask for help, advice or other guidance.

by seubert at August 22, 2016 06:11 PM


How can I efficiently find the optimal order to apply special offers to a shopping cart?

Given a list of items which represent items in a shopping cart, and a list of available special offers which replace one or more regular items to lower the cost of those items, how can I decide the order to apply the special offers to minimize the final basket price?

For example, I have in my cart 4 items:

  • Coke \$2
  • Coke \$2
  • Sandwich \$3
  • Chocolate bar \$1

Total: \$8

There are two special offers in store:

  • Buy one get one free coke (\$2 saving).
  • Coke, chocolate bar and a sandwich for \$4.50 (\$1.50 saving).

One method of determining the order offers are applied might be to sort them by the savings they give. After applying the offers using this method my cart now looks like this:

  • Buy one get one free coke \$2
  • Sandwich \$3
  • Chocolate bar \$1

Total: \$6

There is no meal deal offer applied because after the Coke deal is applied there is not any coke items left to make a meal deal. This method of sorting by savings may seem to work, but there are cases in which it can fail, for example if the same deals were in place and my cart looked like this:

  • Coke \$2
  • Sandwich \$3
  • Chocolate bar \$1
  • Coke \$2
  • Sandwich \$3
  • Chocolate bar \$1

Total: \$12

After deals are applied, the two Coke items are substituted for the promotional offer first (it being the deal with the greatest saving). There is no other applicable deal so the algorithm ends, reducing the basket price by \$2. Obviously there is an error here because if two meal deals were applied before the coke deal, the price would have been reduced by $3.

The naive solution to this problem would be to enumerate each possible permutation of the list of special offers, and find the one that minimizes the basket total when applied. This would have a factorial runtime based on the number of special offers available.

Is it possible to improve on a factorial runtime and if not, are there any efficient approximate solutions?

by user19030 at August 22, 2016 06:01 PM



Adding negative EV position to portfolio for diversification?

Say I have a portfolio of expected return $10%$ and volatility $20%%. If I have another asset that is either one of:

  1. Negatively correlated
  2. Positively correlated
  3. Uncorrelated

With negative expected return $\mu < 0$ and volatility $\sigma$. From intuition I think that if we are allowed to use leverage, we should be adding this to portfolio under scenarios 1 and 3 to reduce risk (and apply leverage to achieve desire rate of return). Is this true? How would I size this position if I want to target 10%?

Is this scenario similar to the case of shorting one asset and buying another that are positively correlated to each other? In both instances (long/short positively correlated or long/long negatively.. or zero correlated), they should be risk reducing. And if we're allowed to use leverage we should be ale to achieve target return at lower risk? Though this also depends on the bounds of expected return and correlation?

Basically, is it ever smart to add something with negative expected value to a portfolio depending on its correlation to the portfolio?

by bob at August 22, 2016 05:33 PM



Ich weiß ja nicht, wie euch das geht, aber wenn so ...

Ich weiß ja nicht, wie euch das geht, aber wenn so im öffentlichen Diskurs gefühlt die Mehrheit mit mir übereinstimmt, dann stellt sich das Gefühl ein, dass ich gerade für die falsche Seite kämpfe.

So geht mir das gerade bei der Amadeu-Antonio-Stiftung. Mir fällt auf, dass ich keinen einzige Fürsprecher von denen im Blick habe gerade. Es gibt da niemanden.

Klar, es gibt so Leute, die dann persönlich werden und mich als Querfront oder Arschloch beschimpfen. Geschenkt. Wird es immer geben. Aber so inhaltliche Argumente für diese Stiftung und ihr Gebahren? Nichts! Nirgendwo!

Ich nehme daher an, dass ich hier von widrigen, wahrscheinlich selbstverschuldeten Umständen vom Wahrnehmen der (eloquenten, validen) Pro-Argumente für diese Stiftung abgehalten werde.

Daher hier jetzt dieser Aufruf. Wenn jemand Pro-Argumente kennt, oder besser noch: Leute, die die Pro-Argumente für diese Stiftung selbst flammend verteidigend vorbringen, dann stellt doch mal bitte einen Kontakt her. Ich würde denen ungerne Unrecht tun. Gerade wenn alle Fakten glasklar so aussehen, als sei die Position offensichtlich, ist sie es ja im Allgemeinen gerade nicht, sondern man hat sich bloß in einer Filterblase eingeklemmt.

Aber Achtung: Ich will hier keine taktischen Argumente hören (so ala "ja, die greifen da gerade ins Klo, aber wir brauchen sie aus diesen anderen Gründen noch für diesen anderen Feldzug"). Ich will gerne eine Verteidigung von deren Tun hören, nicht wieso sie möglicherweise das kleinere Übel gegenüber den Nazis sind. Das reicht mir nicht.

August 22, 2016 05:00 PM

Während wir auf Fürsprecher der Amadeu-Antonio-Stiftung ...

Während wir auf Fürsprecher der Amadeu-Antonio-Stiftung warten, hier noch ein Blogposting, das mal die Maßstäbe des Grundgesetzes anlegt und dabei zu unschönen Ergebnissen kommt.

Dass ich als alter Atheist nochmal auf linken würde, hätte ich auch nicht gedacht.

August 22, 2016 05:00 PM


Сaffe constant multiply layer

How can I define multiply constant layer in Caffe? Like MulConstant in Torch.

I need a way to add it manually with predefined const to existing network.

I've tried like that, but Caffe fails to parse it:

layers {
  name: "caffe.ConstantMul_0"
  type: "Eltwise"
  bottom: "caffe.SpatialConvolution_0"
  top: "caffe.ConstantMul_0"
  eltwise_param {
    op: MUL
    coeff: 0.85

by UndeadDragon at August 22, 2016 04:54 PM


Are vector clocks useful in centralized systems?

Vectors clocks seem to be a common way to synchronize the partial ordering of events in a distributed, peer-to-peer, system across all clients.

Is there any benefit to using them in a centralized system, where one node in the system has the power to order all events anyway, to order events? If one computer can decide the order anyway, there would be no need for vector clocks, right?

by Alex Chumbley at August 22, 2016 04:47 PM



Text Classification/Document Classification with Sequence Tagging with Mallet

I have documents arranged in folders as classes called categories. For a new input (such as a question asked), I have to identify its category. What is be the best way to do this using MALLET? I've gone through multiple articles about this, but couldn't find such a way.

Also, do I need to do sequence tagging on the input text?

by Amit Kumar at August 22, 2016 04:25 PM

Kotlin on Android: map a cursor to a list

In Kotlin, What's the best way to iterate through an Android Cursor object and put the results into a list?

My auto-converted Java:

val list = ArrayList<String>()
while (c.moveToNext()) {

Is there a more idiomatic way? In particular, can it be done in a single assignment of a read-only list? E.g....

val list = /*mystery*/.map(getStringFromCursor)

... or some other arrangement, where the list is assigned fully-formed.

by Martin Stone at August 22, 2016 04:23 PM

tensorflow merge input and output

I would like to use two model in tensorflow in a row, to fit the first one and to use directly it for the second one as input. But I didn't find the good way to do it. I tried to proceed as the following ,

x = tf.placeholder('float', shape=[None, image_size[0] , image_size[1]])

y1_ = tf.placeholder('float', shape=[None, image_size[0] , image_size[1], 1])
y2_ = tf.placeholder('float', shape=[None, image_size[0] , image_size[1],\ 
image = tf.reshape(x, [-1,image_size[0] , image_size[1],1])
# y1 first output, to fit
W_conv = weight_variable([1, 1, 1, labels_count])
b_conv = bias_variable([labels_count])

y1 = conv2d(image, W_conv) + b_conv

cross_entropy1 = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(y1, y1_))
train_step1 =\
# Then use as input the folowing
im_y1 = tf.zeros_initializer([None,image_size[0] , image_size[1],2])

The thing is to minimise first minimise cross_entropy( y1 y1_) with parameters W_conv b_conv then use y1 as parameter by construciting im_y1 as describe.

But like I written it, it dosent work because tf.zeros_initializer refuse to get the argument None.

What is the good way to pipeline different fit in the same model in Tensorflow?

Thanks to any comments!

by Guissart Sebastien at August 22, 2016 04:22 PM



Enumerating all simply typed lambda terms of a given type

How can I enumerate all simply typed lambda terms which have a specified type?

More precisely, suppose we have the simply typed lambda calculus augmented with numerals and iteration, as described in this answer. How can I enumerate all lambda terms of type N (natural number)?

For example, the first few lambda terms of type N are

succ zero
succ (succ zero), K zero zero
succ (succ (succ zero)), K zero (suc zero), K (suc zero) zero, iter zero suc zero

and so on. How can I systematically continue this pattern, while ensuring that only well-typed terms are generated?

by Carlos at August 22, 2016 04:12 PM



Sharpe Ratio and your annualization

My question is related on this How to annualize Sharpe Ratio? but is a bit different.

Under assumpion of IID returns, if excess return is positive, the SR increase over time horizon, with factor $\sqrt T$. Looked at in this way it seems that simply by increasing time horizon the risk reward improves. But if we take the variance, instead of standard deviation, this effect disappears; moreover the ratio remain constant over time. This fact seems to me strange. What do you think?

by markowitz at August 22, 2016 03:32 PM


Grouping GPS coordinates [on hold]

Hello I am new in Datamining, I am using K-means to Cluster the coordinates using euclidean distance. the question is like, Is there any way by which i can map the clustered coordinates with its respective attributes in the original data-set?

The original data set looks like this


|    534550298|2015|      046|                      35.7449|                      -86.7489|
|    534550299|2015|     0331|                     -37.5627|                       143.863|
|    534550300|2015|      071|                     -38.2348|                       146.395|
|    534550301|2015|      010|                      35.6501|                      -80.5164|
|    534550302|2015|      020|                         23.0|                        -102.0|
|    534550303|2015|      193|                         23.0|                        -102.0|
|    534550304|2015|      193|                         23.0|                        -102.0|
|    534550305|2015|      020|                      37.7334|                      -84.2999|
|    534550306|2015|      020|                      42.3442|                      -75.1704|
|    534550307|2015|      020|                       -18.15|                         177.5|
|    534550308|2015|      012|                         11.0|                          78.0|
|    534550309|2015|      051|                      -2.0729|                       146.937|
|    534550310|2015|      051|                      -2.0729|                       146.937|
|    534550311|2015|      012|                         11.0|                          78.0|
|    534550312|2015|      012|                         11.0|                          78.0|
|    534550313|2015|      012|                      41.5834|                      -72.7622|
|    534550314|2015|      138|                         39.0|                          35.0|
|    534550315|2015|      120|                        -10.0|                         -55.0|
|    534550316|2015|      080|                      10.5167|                       76.2167|
|    534550317|2015|      020|                      41.5834|                      -72.7622|

This the output of one cluster after i cluster the GPS points.

 Cluster with a size of 17 starts here:
[[-16.5272, 29.9841], [-16.5272, 29.9841], [-17.8178, 31.0447], [-17.8178, 31.0447], [-17.8178, 31.0447], [-16.8925, 34.6558], [-16.8925, 34.6558], [-16.8925, 34.6558], [-16.8925, 34.6558], [-15.6667, 35.2], [-15.6667, 35.2], [-17.8178, 31.0447], [-17.8178, 31.0447], [-13.5, 34.0], [-16.6389, 32.022], [-16.6389, 32.022], [-16.6389, 32.022]]
Cluster ends here.

So i want way to again retrieve the attributes from original data set with respect to clustered coordinates. Is this possible or any alternate solution for the problem?

by Shafaat Hussain at August 22, 2016 03:10 PM

Predicting SPC (Statistical Process Control)

I will give a brief explanation to my scenario. The company mass produces components like valves/nuts/bolts etc which need to measured for dimensions (like length,radius,thickness etc) for quality purposes. As it is not feasible to inspect all the pieces, they are chosen in a batch style. Foe eg: from a batch of every 100 pieces, 5 will be randomly selected & mean of their dimensions measured & noted for drawing SPC control charts (plots mean dimension on y axis & batch number on x axis).

Even though there are a number of factors (like operator efficiency, machine/tool condition etc) which affect the quality of the product, they don't seem to be measurable. My objective is to develop a machine learning model to predict the product dimensions of the coming batch samples(mean). This will help the operator to forecast if there is going to be any significant dimensional variation so that he can pause working & figure out potential reasons & thus prevent the wastage of the product/material.

I have some idea about R programming & machine learning techniques like decision trees/regression etc but couldn't land on a proper model for this. Mainly because I couldn't think of the independent variables for this situation. I don't have much idea about time series modelling though.

Will someone throw some insights/ideas/suggestions about how to tackle this. I am sorry that I had to write a long story but just wanted to make things as clear as possible.

Thanks in advance. Sreenath

by Sreenath1986 at August 22, 2016 03:08 PM

Is there anything wrong with using this custom MySQLi escaping PHP function?

I wrote a basic wrapper function to escape a string using MySQLi. Is there anything wrong with using this? Is is better than the original? Is it useful?

The function takes two arguments, $conn, which is the MySQLi connection, and &$var, which is the string you want to escape.

function escapestr($conn, &$var){
    $var = $conn->real_escape_string($var);
    return $var;


$conn = mysqli_connect("localhost", "username", "password", "my_favourite_db");
$userInput = $_GET["input"]; // value: this is my "inputted" string
$userInput = escapestr($conn, $userInput); // value: this is my \"inputted\" string

Or, it can directly update the variable.

$conn = mysqli_connect("localhost", "username", "password", "my_favourite_db");
$userInput = $_GET["input"]; // value: this is my "inputted" string
escapestr($conn, $userInput); // value: this is my \"inputted\" string

by Eddie Hart at August 22, 2016 03:03 PM


Definition of BSE's Investor Categorywise Turnover

I am not entirely sure that this question is on-topic here, but the intent of the question is to understand the definition of a financial metric precisely, and to understand what use it serves in quantitative financial analysis.

I was trying to understand the definition of some data that I have downloaded from BSE's website. It is ostensibly called F&O Investor Categorywise Turnover, and is available here. I have downloaded a certain history of the data and placed it here.

This is what it looks like:

enter image description here

From the data, or the chart, it can be seen that the daily buy and sell quantities are the same for all investor categories. I can believe that this would true for proprietary traders who cannot carry over their position, but this also seems to be true for "Others" which includes private and domestic institutional investors, and also foreign institutional investors.

My questions:

  • is there a definition for these metrics that means that they must (almost) balance at the end of the day? Is there an equilibrium condition that causes them to be almost equal?
  • is there a use for these metrics, either macroeconomic or financial?


by tchakravarty at August 22, 2016 02:58 PM



NFS mount fails at boot time

I have the following in /etc/fstab on FreeBSD:

venture:/usr/redacted    /usr/local/redacted   nfs     rw      0       0

This fails during boot. However, after boot, the following command succeeds

mount -t nfs venture:/usr/redacted /usr/local/redacted

Two related questions:

1) last time I rebooted at the console (this machine is in a datacenter), I'm pretty sure I saw an explanatory message at boot time regarding the failure to mount. I think it had something to do with resolving the hostname. However, this message does not appear in /var/log/messages with other boot-time messages; is there someplace else I should be looking?

2) Any thoughts about what could be preventing the hostname from resolving at boot time, but no problem 30 seconds later from the command prompt?

by davidcl at August 22, 2016 02:20 PM


difference between (>>=) and (>=>)

I need some clarification regarding (>>=) and (>=>).

*Main Control.Monad> :type (>>=)                                                                                                                                                               
(>>=) :: Monad m => m a -> (a -> m b) -> m b                                                                                                                                                
*Main Control.Monad> :type (>=>)                                                                                                                                                               
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c 

I know about bind operator(>>=) but I am not getting the context where (>=>) is useful. Please explain with simple toy example.

Edit : Correcting based on @Thomas comments

by venu gangireddy at August 22, 2016 02:12 PM


Der Internationale Strafgerichtshof in Den Haag hat ...

Der Internationale Strafgerichtshof in Den Haag hat zum ersten Mal einen Fall von kultureller Zerstörung gehört. Es geht um Islamisten die in Timbuktu historische Schreine zerstört hatten. Hier ist, was passierte:
It is the first time that the court in The Hague has tried a case of cultural destruction.

It is also the first time a suspected Islamist militant has stood trial at the ICC and the first time a suspect has pleaded guilty.

Das hilft den Schreinen natürlich auch nicht mehr viel. Aber das finde ich schon bemerkenswert, dass der erste, der vor Gericht zu seinen Taten steht, ein Islamist ist.

August 22, 2016 02:00 PM

Die Süddeutsche versucht mal zu recherchieren, wie ...

Die Süddeutsche versucht mal zu recherchieren, wie und wieviel Facebook eigentlich so herumzensiert.

Komisch, seit ein paar Jahren fragt mich gar keiner mehr, wieso ich eigentlich nicht auf Facebook bin, meine Inhalte nicht dort verbreite. Wir sind da jetzt in die schlechtes-Gewissen-Phase übergegangen, ist mein Eindruck. Die Leute haben alle noch Accounts, aber sie nutzen sie nur noch mit einem schlechten Gewissen. So wie früher bei Geocities und Myspace. Es ist nur noch Lethargie, die euch da hält.

August 22, 2016 02:00 PM


Prove that $C = \{ x \in N : [0,x] \subseteq W_x \}$ is not saturated

How would one go proving, by using the second recursion theorem, that $C = \{ x \in N : [0,x] \subseteq W_x \}$ (where $W_x$ is the domain of $\phi_x$) is not saturated?

Below is my attempt at a proof so far. I suspect it's wrong, but I can't find the mistake, if there is one. In the latter case, how can it be fixed?

EDIT: There is. The Second Recursion Theorem guarantees that $n$ exists, but it doesn't guarantee that $n \in C$.

A set $A$ is saturated $\overset{\Delta}{\equiv}$ $x \in A \wedge \phi_x = \phi_y \Rightarrow y \in A$

So, suppose $C$ is saturated.

The second recursion theorem states that for all $h$ total, computable: $$ \exists n : \phi_n = \phi_{h(n)} $$

It would be then sufficient to construct a computable, total $h$ and show that for all $n \in C$, $h(n) \not \in C$, including the $n$ such that $\phi_n = \phi_{h(n)}$, guaranteed to exist by the 2RT.

This way we would have $n \in C$ such that $\phi_n = \phi_{h(n)}$ but $\phi_{h(n)} \not\in C$, therefore the saturation property wouldn't hold.


$$g(x,y) = \begin{cases} \uparrow & y = 0 \\ y & y \neq 0 \end{cases} $$

The smn theorem guarantees that there is a computable function $s$ s.t. $ \phi_{s(x)} (y) = g(x,y)$.

Clearly $0 \not\in Dom(\phi_{s(x)})$ for any $x$ and thus $[0,x] \not\subseteq W_{s(x)}$: thus $s(x) \not\in C$ for all $x$.

Let $h = s$.

Now $\forall x \in C h(x) \not \in C$, and by the second recursion theorem we have a counterexample that shows that the saturation property does not hold:

$$n : \neg(n \in C \wedge \phi_n = \phi_{h(n)} \Leftrightarrow h(n) \in C)$$

This concludes the proof.

by Tobia Tesan at August 22, 2016 01:56 PM

Formal way to model or describe distributed systems architecture

I've been tasked to create the systems architecture for a distributed system.

One approach to designing this system is to pick systems architecture patterns, and then evaluate different technologies that implement those architectural patterns.

For example, a particular architecture might call for a message bus, and given that, I could choose between various off-the-shelf open source or commercial projects that implement a message bus.

While this approach yields a nice white-board diagram, and a high-level understanding of how the system will work, some drawbacks are:

  • its difficult to gauge the performance of the system as a whole without fully implementing it
  • its difficult to determine how well each pattern / implementation will mesh with the other components
  • because of that, choosing between patterns tends to be gut feelings similar to "Kafka is cool, I used it on project X and it did really well"
  • there are no hard guarantees about the performance of the system as a whole (consistency, availability, etc)

Is there a formal approach to modeling distributed systems? Ideally one that provides a way to abstract patterns, and provide analytic tools for making predictions about the behavior of the system?

by juwiley at August 22, 2016 01:52 PM


mysqld_safe not starting after install MySQL 5.7.13 port in FreeBSD/amd64 10.3 using pkg

I have a problem with running mysql server. I've just installed FreeBSD 10.3 and I want to run here MySQL server, but process doesn't starts.

Here are all commands i gave after install FreeBSD, step-by-step:

portsnap fetch extract
pkg update
pkg install mysql57-server

/* Here mysql says about .mysql_secret file with password to root, but it's not generating at all. I can use but there is no result... */

find / -iname .mysql_secret

When I try to first run MySQL using this command:

mysqld_safe --initialize --user=mysql

I get this one:

mysqld_safe Logging to '/var/db/mysql/host.err'
mysqld_safe Starting mysqld deamon with databases from /var/db/mysql
mysqld_safe mysqld from pid file /var/db/mysql/ ended

Here you are /var/db/mysql/host.err

2016-08-22T11:56:27.6NZ mysqld_safe Starting mysqld daemon with databases from /var/db/mysql
2016-08-22T11:56:27.533572Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting.
2016-08-22T11:56:27.533635Z 0 [ERROR] Aborting

2016-08-22T11:56:27.6NZ mysqld_safe mysqld from pid file /var/db/mysql/ ended

I found something simmilar:

There is still no solution. Any ideas? I really need MySQL. I have tried with MySQL 5.6 too. Same problem...

At the end /usr/local/etc/mysql/my.cnf

# $FreeBSD: branches/2016Q3/databases/mysql57-server/files/ 414707 2016-05-06 14:39:59Z riggs $

port                            = 3306
socket                          = /tmp/mysql.sock

prompt                          = \u@\h [\d]>\_

user                            = mysql
port                            = 3306
socket                          = /tmp/mysql.sock
bind-address                    =
basedir                         = /usr/local
datadir                         = /var/db/mysql
tmpdir                          = /var/db/mysql_tmpdir
slave-load-tmpdir               = /var/db/mysql_tmpdir
secure-file-priv                = /var/db/mysql_secure
log-bin                         = mysql-bin
log-output                      = TABLE
master-info-repository          = TABLE
relay-log-info-repository       = TABLE
relay-log-recovery              = 1
slow-query-log                  = 1
server-id                       = 1
sync_binlog                     = 1
sync_relay_log                  = 1
binlog_cache_size               = 16M
expire_logs_days                = 30
default_password_lifetime       = 0
enforce-gtid-consistency        = 1
gtid-mode                       = ON
safe-user-create                = 1
lower_case_table_names          = 1
explicit-defaults-for-timestamp = 1
myisam-recover-options          = BACKUP,FORCE
open_files_limit                = 32768
table_open_cache                = 16384
table_definition_cache          = 8192
net_retry_count                 = 16384
key_buffer_size                 = 256M
max_allowed_packet              = 64M
query_cache_type                = 0
query_cache_size                = 0
long_query_time                 = 0.5
innodb_buffer_pool_size         = 1G
innodb_data_home_dir            = /var/db/mysql
innodb_log_group_home_dir       = /var/db/mysql
innodb_data_file_path           = ibdata1:128M:autoextend
innodb_temp_data_file_path      = ibtmp1:128M:autoextend
innodb_flush_method             = O_DIRECT
innodb_log_file_size            = 256M
innodb_log_buffer_size          = 16M
innodb_write_io_threads         = 8
innodb_read_io_threads          = 8
innodb_autoinc_lock_mode        = 2

max_allowed_packet              = 256M

by qwaler at August 22, 2016 01:45 PM


Poor results with tensorflow DNNClassifier and cross_val_score

I am using python 3.5, tensorflow 0.10 and its DNNClassifier. If I perform a single training and testing stage, as below, the test result is decent: accuracy = 0.9333

import tensorflow as tf
from tensorflow.contrib import learn
from sklearn.cross_validation import cross_val_score, ShuffleSplit, train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn import datasets, cross_validation

iris = datasets.load_iris()

feature_columns = learn.infer_real_valued_columns_from_input(

x_train, x_test, y_train, y_test = train_test_split(,, test_size=0.20, random_state = 20)

model = learn.DNNClassifier(hidden_units=[5], 
                            ), y_train, steps=1000)
predicted = model.predict(x_test)

print('Accuracy on test set: %f' % accuracy_score(y_test, predicted))

If I use sklearn's cross_val_score, then the final results is much poorer, about 0.33 accuracy:

model = learn.DNNClassifier(hidden_units=[5], 

scores = cross_val_score(estimator=model, 
                         scoring = 'accuracy',
                         fit_params={'steps': 1000},
#                          verbose=100


The scores ad their mean are:

[ 0.          0.33333333  1.          0.33333333  0.        ]

What's wrong with my code in cross validation estimation?

by lmsasu at August 22, 2016 01:40 PM


Relation between "syntax" and "grammar" in CS

I do sure that "grammar" and "syntax" is two different thing in CS, e.g

Syntax of Java language is defined by a context-free grammar.

My question are

What is different in definitions of "grammar" and "syntax" in CS?

What is relation between them, can we describe it by using set theory?

by fronthem at August 22, 2016 01:33 PM

Set notation of the set of all strings

How do I present the complement using set notation?

I guess it has to be shows with universal set - {aa,bb} but I do not know how to represent the universal set in terms of set notation. Since, the strings of the universal set can be anything over alphabets {a,b}. So, how to represent it? I guess something like this might work if we are going from outside to inside {{{a,b}*}*}. Any help is appreciated.

by aste123 at August 22, 2016 01:31 PM


Suggested hardcopy book for algorithms and patterns for my upcoming 16hr trip to China [& 16 back]?

As you may know from my previous post I am going to China to see family. I am learning specifically Perl 6 but want a good refresh of basics and a book might help me. Thank you in advance.

Edited — Changed first word from “Best” to “Suggested” and added a question mark for clarity. - Author.

by Usermac at August 22, 2016 01:29 PM


Classes with static arrow functions

I'm currently implementing the static land specification (an alternative of fantasy land). I want to not only use plain objects as types but also ES2015 classes with static methods. I've implemented these static methods as arrow functions in curried form instead of normal functions. However, this isn't possible with ES2015 classes:

class List extends Array {
  static map = f => xs => => f(x))
  static of = x => [x]

My map doesn't need its own this, because it is merely a curried function on the List constructor. To make it work I have to write static map(f) { return xs => => f(x)) }, what is very annoying.

  • Why can't I use arrow functions along with an assignment expression in ES2015 classes?
  • Is there a concise way to achieve my goal anyway?

by LUH3417 at August 22, 2016 01:19 PM

How do i make a range to my custom sldier?

I'm making a custum slider. I have two numbers...

Min = 120 Max = 400

I want: 120 = 0% 400 = 100%

I don't know the equation for this. I want create a percentage to a custom range... How do i do it?

by Andrews at August 22, 2016 01:17 PM

Using tso to identify outliers warning

I'm trying to identify the outliers in my data using the code below with function tso from the package tsouliers in R. I'm fairly certain my data has level shifts. I'm getting the warning below. I'm unclear what the issue is.



tsSeries<-ts(xData, frequency=168)


Warning messages:

1: In auto.arima(x = c(11, 14, 17, 5, 5, 5.5, 8, NA, 5.5, 6.5, 8.5,  :
  Unable to fit final model using maximum likelihood. AIC value approximated
2: In auto.arima(x = c(11, 14, 17, 5, 5, 5.5, 8, NA, 5.5, 6.5, 8.5,  :
  Unable to fit final model using maximum likelihood. AIC value approximated

I updated my code and added na.approx. The code now finds outliers in the data, and returns the set of warnings below. Why is the code now finding outliers when before it didn't? Are these legitimate outliers? What do the warnings below mean, are there settings I should change in tso to resolve them? All tips very much appreciated.

##Updated Code

test<-ts(na.approx(xData), frequency=168)


Warning messages:

1: In auto.arima(x = c(11, 14, 17, 5, 5, 5.5, 8, 6.75, 5.5, 6.5, 8.5,  :
  Unable to fit final model using maximum likelihood. AIC value approximated
2: In sqrt(diag(fit$var.coef)[id]) : NaNs produced
3: In auto.arima(x = c(11, 14, 17, 5, 5, 5.5, 8, 6.75, 5.5, 6.5, 8.5,  :
  Unable to fit final model using maximum likelihood. AIC value approximated
4: In auto.arima(x = c(11, 14, 17, 5, 5, 5.5, 8, 6.75, 5.5, 6.5, 8.5,  :
  Unable to fit final model using maximum likelihood. AIC value approximated


c(11, 14, 17, 5, 5, 5.5, 8, NA, 5.5, 6.5, 8.5, 4, 5, 9, 10, 11, 
7, 6, 7, 7, 5, 6, 9, 9, 6.5, 9, 3.5, 2, 15, 2.5, 17, 5, 5.5, 
7, 6, 3.5, 6, 9.5, 5, 7, 4, 5, 4, 9.5, 3.5, 5, 4, 4, 9, 4.5, 
6, 10, NA, 9.5, 15, 9, 5.5, 7.5, 12, 17.5, 19, 7, 14, 17, 3.5, 
6, 15, 11, 10.5, 11, 13, 9.5, 9, 7, 4, 6, 15, 5, 18, 5, 6, 19, 
19, 6, 7, 7.5, 7.5, 7, 6.5, 9, 10, 5.5, 5, 7.5, 5, 4, 10, 7, 
5, 12, 6, NA, 4, 2, 5, 7.5, 11, 13, 7, 8, 7.5, 5.5, 7.5, 15, 
7, 4.5, 9, 3, 4, 6, 17.5, 11, 7, 6, 7, 4.5, 4, 4, 5, 10, 14, 
7, 7, 4, 7.5, 11, 6, 11, 7.5, 15, 23.5, 8, 12, 5, 9, 10, 4, 9, 
6, 8.5, 7.5, 6, 5, 8, 6, 5.5, 8, 11, 10.5, 4, 6, 7, 10, 11.5, 
11.5, 3, 4, 16, 3, 2, 2, 8, 4.5, 7, 4, 8, 11, 6.5, 7.5, 17, 6, 
6.5, 9, 12, 17, 10, 5, 5, 9, 3, 8.5, 11, 4.5, 7, 16, 11, 14, 
6.5, 15, 8.5, 7, 6.5, 11, 2, 2, 13.5, 4, 2, 16, 11.5, 3.5, 9, 
16.5, 2.5, 4.5, 8.5, 5, 6, 7.5, 9.5, NA, 9.5, 8, 2.5, 4, 12, 
13, 10, 4, 6, 16, 16, 13, 8, 12, 19, 19, 5.5, 8, 6.5, NA, NA, 
NA, 15, 12, NA, 6, 11, 8, 4, 2, 3, 4, 10, 7, 5, 4.5, 4, 5, 11.5, 
12, 10.5, 4.5, 3, 4, 7, 15.5, 9.5, NA, 9.5, 12, 13.5, 10, 10, 
13, 6, 8.5, 15, 16.5, 9.5, 14, 9, 9.5, 11, 15, 14, 5.5, 6, 14, 
16, 9.5, 23, NA, 19, 12, 5, 11, 16, 8, 11, 9, 13, 6, 7, 3, 5.5, 
7.5, 19, 6.5, 5.5, 4.5, 7, 8, 7, 10, 11, 13, NA, 12, 1.5, 7, 
7, 12, 8, 6, 9, 15, 9, 3, 5, 11, 11, 8, 6, 3, 7.5, 4, 7, 7.5, 
NA, NA, NA, NA, 6.5, 2, 16.5, 7.5, 8, 8, 5, 2, 7, 4, 6.5, 4.5, 
10, 6, 4.5, 6.5, 9, 2, 6, 3.5, NA, 5, 7, 3.5, 4, 4.5, 13, 19, 
8.5, 10, 8, 13, 10, 10, 6, 13.5, 12, 11, 5.5, 6, 3.5, 9, 8, NA, 
6, 5, 8.5, 3, 12, 10, 9.5, 7, 24, 7, 9, 11.5, 5, 7, 11, 6, 5.5, 
3, 4.5, 4, 5, 5, 3, 4.5, 6, 10, 5, 4, 4, 9.5, 5, 7, 6, 3, 13, 
5.5, 5, 7.5, 3, 5, 6.5, 5, 5.5, 6, 4, 3, 5, NA, 5, 5, 6, 7, 8, 
5, 5.5, 9, 6, 8.5, 9.5, 8, 9, 6, 12, 5, 7, 5, 3.5, 4, 7.5, 7, 
5, 4, 4, NA, 7, 5.5, 6, 8.5, 6.5, 9, 3, 2, 8, 15, 6, 4, 10, 7, 
13, 14, 9.5, 9, 18, 6, 5, 4, 6, 4, 11.5, 17.5, 7, 8, 10, 4, 7, 
5, 9, 6, 5, 4, 8, 4, 2, 1.5, 3.5, 6, 5.5, 5, 4, 8, 10.5, 4, 11, 
9.5, 5, 6, 11, 21, 9.5, 11, 13.5, 7.5, 13, 10, 7, 9.5, 6, 10, 
5.5, 6.5, 12, 10, 10, 6.5, 2, 8, NA, 10, 5, 4, 4.5, 5, 7.5, 12, 
22, 5, 8.5, 2.5, 3, 10.5, 4, 7, 13, 4, 3, 5, 6.5, 3, 9, 9.5, 
16, NA, 4, 12, 4.5, 7, 5.5, 8, 14, 3, 8, 12, 14, 7, 8, 6, 8.5, 
6, 6.5, 15.5, 13, 3.5, 12, 7, 6, NA, 3, 5.5, 8.5, 9, 12, 13, 
8, 6.5, 8, 3, 5, 16.5, 2, 7, 6, 2, 5, 6.5, 3, 3, 7, 2, NA, 13, 
7, 16, 13, 12.5, 12, 7, 13, 11, 21.5, 16, 20, 3, 4, 5, 7, 11, 
7, 9, 11, 7, 13, 4, 14, 5, 12, 6, 7, 9, 12, 7, 12.5, 6.5, 16, 
5, 12, 9, 9.5, 9, 7, 9.5, 3, 13, 8, 7, 7, 7, 9, 6, 6, 11, 15, 
9, 6, 19, 10.5, 4, 6, 14.5, 9, 17, 14, 4, 16, 5, 6.5)

by user6183069 at August 22, 2016 01:05 PM


Kennt ihr die Hypothese, dass der Staat grundsätzlich ...

Kennt ihr die Hypothese, dass der Staat grundsätzlich unfähig ist, und man lieber Privatwirtschaft ranlassen sollte?

Nun, in Schottland haben sie das mal mit den öffentlichen Schulen getestet.


More than 200 schools built in Scotland under private finance initiative (PFI) schemes are now at least partially owned by offshore investment funds.
Yeah! Das ist doch mal eine gute Wahl! Endlich macht das mal jemand richtig! Diese Geldverschwendung immer bei der öffentlichen Hand!

Und? Wie läuft es so? Utopia?

The 17 schools built in Edinburgh under PPP1 were closed for repairs earlier this year after construction faults were found.
Oh. Hmm. Nun, äh, das wäre bestimmt auch unter anderen Umständen passiert!1!!

August 22, 2016 01:00 PM


How is the problem of sorting in contiguous runs called?

I am having a bit of brain fail and I can't remeber the name of the following problem (so I can find some literature around it...).

Given a sequence of values, sort it in a way that equal elements are compacted in runs (contiguous subsequences of identical elements).

For instance:

$$ \{1, 2, 4, 2, 1, 3\} \rightarrow \{ 2, 2, 4, 3, 1, 1 \} $$

The runs are not otherwise sorted -- only equality comparison is required, not ordering; and they're compacted (there should not be two different runs containing equal elements).

by peppe at August 22, 2016 01:00 PM



Reducing k Vertex Cover to SAT (last clause problem)

I am working on a transformation from k Vertex Cover to SAT and I have some issues regarding the last clause in the boolean formula.

Here is my approach: $$\forall \text{ nodes } n_i \in V, \text{ invent variables } v_i$$ $$\forall \text{ edges } (n_i, n_j) \in E, \text{ invent the terms } (v_i \lor v_j) \Rightarrow C_{edges}$$ $$\text{ encode the proposition: exactly } k \ v_i \text{ variables are true} \Rightarrow C_{prop}$$

$$\varphi = C_{edges} \land C_{prop}$$


$$C_{edges} = \land_{(n_i, n_j) \in E} \ (v_i \lor v_j)$$

Now, we invent $\frac{n(n-1)}{2}$ new variables, $y_{ij}$, with the following meaning:

$$y_{ij} = 1 \iff \text{exactly } j \text{ variables in } v_1 \cdots v_i \text{ are } 1$$


$$y_{ij} = 1 \iff (y_{i-1,j} \land \neg v_i) \lor (y_{i-1,j-1}\land v_i)$$


$$y_{ij} \equiv (y_{i-1,j} \land \neg v_i) \lor (y_{i-1,j-1}\land v_i)$$


$$i = 2\cdots n, \ j = 1\cdots i \text{ and for } i = 1, \text{ we have } y_{11} \equiv v_1$$

Thus, $C_{prop} = y_{nk}$

Now, I should write $y_{nk}$ in terms of $v_1 \cdots v_n$, but I suppose that might exceed the polynomial-time constraint of the transformation.

Is this approach correct or am I missing something? Is there a way I can express $y_{nk}$ without writing it in terms of $v_i$, in the end?

by Alexandru Dinu at August 22, 2016 12:28 PM


The Relation Between the Ricci flow and the Black-Scholes-Merton Equation

Grisha Perelman once wrote that

The Ricci-flow equation, a type of heat equation, is a distant relative of the Black-Scholes equation that bond traders around the world use to price stock and bond options.

Wilmot has derived from the BS Equation to the heat equation, but wonder if there is any proof that you can get the BS Equation from the Ricci flow.

by Dendi Suhubdy at August 22, 2016 12:27 PM


The Tradeoff of Multiple Repositories

More often than I expect, I come across software projects that consist of multiple source control repositories. The reasons vary. Perhaps it’s thought that the web frontend and backend aren’t tightly coupled and don’t need to be in the same repository. Perhaps there’s code that’s meant to be used throughout an entire organization. Regardless, there are real costs involved in the decision to have a development team work in distinct, yet related, repositories. I believe these costs are always overlooked.

Double (or n Times) the Gruntwork

The most obvious cost involved is additional gruntwork. Let’s imagine a project with a mobile app and web service, each having its own Git repository. When it’s time to start a new feature, the feature branch will need to be created twice. When the work is finished, two pull requests will need to be made. When it’s appropriate to make a commit, it might need to be done twice. When it’s time to push, it might need to be done twice. To help manage all of this, an extra terminal might be appropriate.

Individually, none of these costs is very significant. Collectively, they represent a moderate inconvenience and cognitive burden. I’ve seen developers weigh this and decide it’s worth the cost, because they are trying to achieve some other ideal.

Ultimately, these inconveniences are just symptoms of a more fundamental—and easily overlooked—tradeoff.

Context: Not Version-Controlled

A repository is essentially a set of snapshots in time. For any commit, it’s easy to see not only what changes were made, but also precisely what other files existed and contained at that point in time. This is pretty obvious, after all. It’s one of the biggest selling points of version control.

With a project consisting of one single repository, that snapshot encapsulates everything there is to know about the source code. Once there are multiple repositories involved in a single project, this context is fragmented.

This fragmentation manifests in various ways. Let’s look at some examples:

  • When moving code between repositories, neither one has knowledge of the other. Information about where the code came from or went is lost.
  • If your frontend branch repo depends on your server to be running with a corresponding branch, there’s no native or reasonable way to express that relationship. Information is lost.

The Real Tradeoff of Multiple Repositories

Breaking a project into multiple repositories involves a fundamental tradeoff. By doing so, information about the broader context of the application is pushed entirely outside of version control.

Although it’s possible to work to counteract this, for example, by establishing team practices, using Git submodules, or building custom machinery, it will require work. That’s work spent to regain what you get for free by using a single repository.

Therefore, the most likely place that this information will move is into the culture and individual minds of the team. This is a much more ephemeral and unreliable place than a source repository. It makes it harder to onboard new developers and coordinate things like continuous integration.


It’s up to your unique situation whether it’s a win or loss to split your code into multiple repositories, but the costs are both real and easily overlooked. I’d strongly suggest weighing these tradeoffs thoughtfully. And, if you find yourself on a project where these costs are bringing you down, I’ve written a blog post on how to super-collide your repositories together.

The post The Tradeoff of Multiple Repositories appeared first on Atomic Spin.

by Chris Farber at August 22, 2016 12:00 PM

Fred Wilson

The Spillover Effect

The New York Times has a piece today about how bay area tech companies are giving the Phoenix Arizona economy a boost.

I think this is a trend we are just seeing the start of.

A big theme of board meetings I’ve been in over the past year is the crazy high cost of talent in the big tech centers (SF, NYC, LA, Boston, Seattle) and the need to grow headcount in lower cost locations.

This could mean outside of the US in places like Eastern Europe, Asia, India, but for the most part the discussions I have been in have centered on cities in the US where there is a good well educated work force, an increasing number of technically skilled workers, and a much lower cost of living. That could be Phoenix, or it could be Indianapolis, Pittsburgh, Atlanta, and a host of other really good places to live in the US.

Just like we are seeing tech seep into the strategic plans of big Fortune 1000 companies, we are seeing tech seep into the economic development plans of cities around the US (and around the world). Tech is where the growth opportunities are right now.

A good example of how this works is Google’s decision to build a big office in NYC in the early part of the last decade and build (and buy) engineering teams in that office. Google is now a major employer in NYC and the massive organization they have built has now spilled over into the broader tech sector in NYC. My partner Albert calls Google’s NYC office “the gift that Google gave NYC.”

We will see that story play out across many cities in the US (and outside of the US) in the next five to ten years. It is simply too expensive for most companies to house all of their employees in the bay area or NYC. And so they will stop doing that and go elsewhere for talent. That’s a very healthy and positive dynamic for everyone, including the big tech centers that are increasingly getting too expensive to live in for many tech employees.

by Fred Wilson at August 22, 2016 11:59 AM



how can I improve my LSTM code on tensorflow?

I am trying to predict the household power consumption using LSTM. The following is a small portion of input data(total training input is about ~1M records) that I am using to train my model followed by my LSTM code using TensorFlow. I need some help with:

(1) Verification of my model : I would like a network with 2 LSTM layers of size 512 with time_step: 10. I think I only have 1 LSTM hidden layer, but I am not sure how to add another.

(2) General improvement on my model : seems like the accuracy is not converging to any specific value and I am not quite sure where to look at this point. Any advice on modeling/stacking layers/choice of optimizer/etc will be much appreciated.

Input data (:


LSTM code :

from getdata_new import DataFromFile
import numpy as np
import tensorflow as tf
import datetime

# ==========
# ==========

# Parameters
learning_rate = 0.01
training_iters = 100000
batch_size = 1000
display_step = 10

# Network Parameters
seq_len = 10 # Sequence length
n_hidden = 512 # hidden layer num of features
n_classes = 1201 #
n_input = 13
num_layers = 2

trainset= DataFromFile(filename="/tmp/train_data.txt",delim=";")

# Define weights
weights = {
      'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
biases = {
      'out': tf.Variable(tf.random_normal([n_classes]))

x = tf.placeholder("float", [None, seq_len, n_input])
y = tf.placeholder("float", [None, n_classes])

def RNN(x, weights, biases):
      x = tf.transpose(x, [1, 0, 2])
      x = tf.reshape(x, [-1, n_input])
      x = tf.split(0, seq_len, x)

      lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden)

      outputs, states = tf.nn.rnn(lstm_cell, x, dtype=tf.float32)

      # Linear activation, using outputs computed above
      return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
optimizer =     tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

testPred = tf.argmax(pred,1)

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    acc = 0.0
    for step in range(1,training_iters+1):
        batch_x, batch_y = trainset.train_next(batch_size,seq_len)

        # Run optimization op (backprop), feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # Calculate batch accuracy
            acc =, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss =, feed_dict={x: batch_x, y: batch_y})
            print "Iter " + str(step) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
    print "Optimization Finished!"

Result :

Iter 99860, Minibatch Loss= 0.015933, Training Accuracy= 0.29900
Iter 99870, Minibatch Loss= 0.015993, Training Accuracy= 0.26200
Iter 99880, Minibatch Loss= 0.015783, Training Accuracy= 0.30500
Iter 99890, Minibatch Loss= 0.016071, Training Accuracy= 0.27200
Iter 99900, Minibatch Loss= 0.015390, Training Accuracy= 0.40300
Iter 99910, Minibatch Loss= 0.015247, Training Accuracy= 0.43700
Iter 99920, Minibatch Loss= 0.015264, Training Accuracy= 0.42700
Iter 99930, Minibatch Loss= 0.015212, Training Accuracy= 0.43800
Iter 99940, Minibatch Loss= 0.016164, Training Accuracy= 0.26500
Iter 99950, Minibatch Loss= 0.015923, Training Accuracy= 0.30800
Iter 99960, Minibatch Loss= 0.016338, Training Accuracy= 0.22600
Iter 99970, Minibatch Loss= 0.016327, Training Accuracy= 0.19000
Iter 99980, Minibatch Loss= 0.016322, Training Accuracy= 0.22300
Iter 99990, Minibatch Loss= 0.016608, Training Accuracy= 0.15400
Iter 100000, Minibatch Loss= 0.016809, Training Accuracy= 0.10700
Optimization Finished!

by Ben at August 22, 2016 10:39 AM


booth bit pair recording technique

In booth bit pair recording technique how to multiply a multiplicand with -2 or 2? For example while multiplying 01101(+13,multiplicand) and 11010(-6,,multiplier), we get 01101 x 0-1-2. how to multiply the multiplicand using -2?

by Night Wolf at August 22, 2016 10:33 AM


Sending orders to CME [on hold]

For example,
I'm getting quotes (cqg, rithmic, etc),
then doing some math (bash, c#,python, etc),
and then sending orders to CME.

What I need to send orders, please advice?

I heard that brokers have api for such purposes, right?
CQG API, Rithmic API, IBPy - is that I need?

Or should I rent CME membership?

Please explain?

My aim is not HFT, just usual intraday.
Thus, I don't need smth like 'super low latency api/datafeed'.

by Int at August 22, 2016 10:24 AM


Query regarding the structure of Graph over All (known/unknown) NPC Problems?

Let us consider the set of all NPComplete problems. Since every problem in the set is Reducible to/from at least one known NPComplete problem, lets create a directed Graph with the following conventions:

  1. If a problem B is Reducible from Problem A, create an edge from A to B. We assume an oracle who knows all possible reductions who creates the edges.

Here are a few questions regarding the Graph.

Q1. Is the graph strongly connected (i.e. every problem in the Graph is reachable from or reducible from every other problem for every instance?

Guess: I presume the answer is yes.

Q2. For any two problems (say A, B), no matter what the distance b/w them is, there is guaranteed at most Polynomial Blowup in problem size when we reduce, from A to B?

Guess: Unsure. For any problem P, if we reduce from one of its neighboring problem, there is polynomial increase or decrease in space. But, not sure if it holds if the problems are arbitrarily large distance apart. The definition of NPC, needs one single reduction in Polynomial time, but space analogue of reduction for All NPC Problem pairs seems out of reach.

by TheoryQuest1 at August 22, 2016 10:22 AM

ATL Property Pages [on hold]

I am trying to learn ATL and COM and I'm currently looking at property pages. I added a property page to a blank project and the project can be successfully built but when I try to run it I get a pop up error window saying:

Unable to start program \ATLProject\Debug\ATLProject.dll

\ATLProject\Debug\ATLProject.dll is not a valid Win32 application.

I am new to ATL so the Microsoft documentation doesn't help me much with that. Can someone recommend a good tutorial or tell me what I need to do in order to be able to open the property page? Thank you!

by Stargazer at August 22, 2016 10:20 AM


Planet Emacsen

Irreal: Mark Rectangle

If you're like me you don't often have occasion to mark rectangles so it's easy to forget how simple it is to do. Here's a nice reminder from Tony Garnock-Jones.

by jcs at August 22, 2016 09:55 AM


Discount Factor from euribor future [on hold]

I have U6 = 100.339, which represents the Euribor interest rate future. How do I get the corresponding Discount factor?

by PalimPalim at August 22, 2016 09:38 AM

Planet Emacsen


Partial Function Application in Scala

I'm learning Functional Programming, by following the book Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason. I'm specifically on chapter 3, where I am implementing some companion functions to a class representing a singly-linked list, that the authors provided.

package fpinscala.datastructures

sealed trait List[+A]
case object Nil extends List[Nothing]
case class Cons[+A](head: A, tail: List[A]) extends List[A]

object List {
    def sum(ints: List[Int]): Int = ints match {
    case Nil => 0
    case Cons(x,xs) => x + sum(xs)

    def product(ds: List[Double]): Double = ds match {
    case Nil => 1.0
    case Cons(0.0, _) => 0.0
    case Cons(x,xs) => x * product(xs)

    def apply[A](as: A*): List[A] =
    if (as.isEmpty) Nil
    else Cons(as.head, apply(as.tail: _*))

    def tail[A](ls: List[A]): List[A] = ls match {
    case Nil => Nil
    case Cons(x,xs) => xs
... (more functions)

The functions I am implementing go inside the object List, being companion functions.

While implementing dropWhile, whose method signature is:

def dropWhile[A](l: List[A])(f: A => Boolean): List[A]

I came across some questions regarding partial function application:

In the book, the authors say that the predicate, f, is passed in a separate argument group to help the scala compiler with type inference because if we do this, Scala can determine the type of f without any annotation, based on what it knows about the type of the List , which makes the function more convenient to use.

So, if we passed f in the same argument group, scala would force the call to become something like this: val total = List.dropWhile(example, (x:Int) => 6%x==0 ) where we define the type of x explicitly and we would "lose" the possibility of partial function application, am I right?

However, why is partial function application useful in this case? Only to allow for type inference? Does it make sense to "partially apply" a function like dropWhile without applying the predicate f to it? Because it seems to me that the computation becomes "halted" before being useful if we don't apply f...

So... why is partial function application useful? And is this how it's always done or is it only something specific to Scala? I know Haskell has something called "complete inference" but I don't know exactly its implications...

Thanks in advance

by Bruno Oliveira at August 22, 2016 08:45 AM


Regarding the Hurst Exponent

I tried calculating the Hurst Exponent using c#, and compared the results to a series with a known exponent. I am having the following issue in my calculations:

1- All my results are negative.. Instead of positive. I get numbers close to the following:

mean reverting series: -1, random series -0.5, and trending series : 0. The series have Hurst Exppnents of 0, 0.5. 1.00, respectively. It appears as if I take the original and subtract 1 . I have been trying to figure where my error is for a couple of days and can't seem to find it.

Has anyone come across in the past... Is there any suggestions on how to fix it?

by Yandy Chiara Chang at August 22, 2016 08:43 AM

Other numerraire choices when applying Feynman Kac

all of the books and notes I have seen on the Feynman Kac formula mostly applied to Risk neutral measure, i.e. different interest rate models, stochastic volatility, etc. I think risk neutral measure can be replaced with any other measure associated with a traded numerraire $N(t)$ such that $$\frac{V(t)}{N(t)}=\mathbb{E}_t^N\left[\frac{V(T)}{N(T)}\right]$$ So what came to my mind is annuity measure and swaption price or forward measure and cap price. However, I could not find any references on those PDEs. Can someone point me to some references or provide different measure examples and how PDE is derived in that case. It would be especially useful if the example is a "real application" one and can be seen in practice pricing financial instruments.

by Medan at August 22, 2016 08:42 AM


How do I improve this object design in Typescript?

I have created a class in Typescript that implements a simple stream (FRP). Now I want to extend it with client side functionality (streams of events). To illustrate my problem, here is some pseudo-code:

class Stream<T> {

    map<U>(f: (value: T) => U): Stream<U> {
        // Creates a new Stream instance that maps the values.

    // Quite a few other functions that return new instances.


This class can be used both on the server and on the client. For the client side, I created a class that extends this one:

class ClientStream<T> extends Stream<T> {

    watch(events: string, selector: string): Stream<Event> {
        // Creates a new ClientStream instance


Now the ClientStream class knows about map but the Stream class doesn't know about watch. To circumvent this, functions call a factory method.

protected create<U>(.....): Stream<U> {
    return new Stream<U>(.....)

The ClientStream class overrides this function to return ClientStream instances. However, the compiler complains that returns a Stream, not a ClientStream. That can be 'solved' using a cast, but besides being ugly it prevents chaining.

I don't really like this pattern, but I have no other solution that is more elegant. Things I've thought about:

  • Use composition (decorator). Not really an option given the number of methods I would have to proxy through. And I want to be able to add methods to Stream later without having to worry about ClientStream.
  • Mix Stream into ClientStream. More or less the same problem, ClientStream has to know the signatures of the functions that are going to be mixed in (or not? Please tell).
  • Merge these classes into one. This is a last resort, the watch function has no business being on the server.

Do you have a better (more elegant) solution? If you have an idea that gets closer to a more functional style, I'd be happy to hear about it. Thanks!

by Jeroen at August 22, 2016 08:34 AM

I am running GBT in Spark ML for CTR prediction. I am getting exception because of MaxBin Parameter

Exception details :

  • Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: DecisionTree requires maxBins (= 32) to be at least as large as the number of values in each categorical feature, but categorical feature 4139 has 16094 values. Considering remove this and other categorical features with a large number of values, or add more training examples. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:133) at at at org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:208)
GBTClassifier gbt = new GBTClassifier().setLabelCol("indexedclick").setFeaturesCol("features_index").setMaxIter(20).**setMaxBins(16094)**.setMaxDepth(30).setMinInfoGain(0.0001).setStepSize(0.00001).setSeed(200).setLossType("logistic").setSubsamplingRate(0.2);

I want to know what should be the correct max bin size because If even I am setting large value of MaxBin also causing the same exception.

Your small help will be highly appreciated.

by cody123 at August 22, 2016 08:17 AM


Plain Vanilla Interest Rate Swap

I'm trying to build an intuitive understanding of the following

The price of the replicating portfolio at time $t$ of the floating rate receiver is


(Some notation: $\bar{R}$ is the fixed rate. $P_{t,t_n}$ is the value at time $t$ of a zero coupon bond with maturity $t_n$. And we have future times $t_0,…,t_N$.)

My understanding of this is still very young and I have several questions as a result:

So is $P_t^{swap}$ essentially the money it would take to buy the side of the swap (at time $t$) that receives the floating rate, and therefore pays out the fixed rate? e.g. if $P_t^{swap}=0$, you wouldn't make money or lose money in entering this swap.

As we're the floating rate receiver here, we have to pay out the fixed rate every $t_n$ and hence the final term in the expression? It's just been modelled as a sum of zero coupon bonds?

What does $P_{t,t_0}-P_{t,t_N}$ really mean? The value of a zero coupon bond maturing at time $t_0$ minus the value of a zero coupon bond maturing at time $t_n$ (I hope that's right to say) it surely always $>0$, as who would rather buy a zero coupon bond that matures at a later time?

And finally, how is the combination of these three terms the value of the floating rate receiver's replicated portfolio?

I hope it's clear what these questions mean and apologies for anything I've missed.

by Phibert at August 22, 2016 07:55 AM



best bid and best quotes from quotes dataset

I have a dataset containing bid and ask quotes for a single day and stock. It has multiple quotes for some of the timestamps. Can I get best bid and best ask quotes from it?

by Polar Bear at August 22, 2016 07:18 AM


Can i predict data price based on a survey on azure machine learning?

I want to predict my input price based on a list of questions/answers using azure machine learning. I built one using the "bayesian linear regression" but it seems that it is predicting the price based on the prices i have in my dataset and not based on the Q/A. Am i in the wrong path or am i missing something? Any suggestion would be helpful.

by Tayéhi Mouné at August 22, 2016 07:00 AM


Invalid ZFS file system has no data

Background: I have a FreeNas box with a boot SSD and a 2x 3TB HDD. I know only enough linux and FreeNas to get me in trouble and must have gotten it up and running a while ago. I transferred data to the drive (somehow) and backed it up to CrashPlan (since disappeared). I moved the box to the garage to get it out of the middle of the floor and forgot about it.

Recently, I went to retrieve data off the hard drive by pulling it out of the box and putting it in my Windows box. The drive was seen by disk management with two partitions, but I was unable to assign a drive letter (disk1). Starting to panic, I grabbed the other drive and put it in the Windows box to find that Windows did see it and assign it a drive letter, but it was empty (disk2).

I cloned the drive that I couldn't mount (disk1) to the drive Windows could mount (disk2) so I could go about recovering the partition. I loaded up easeus to recover the gpt partition and found that it said "invalid ZFS file system". I grabbed the SSD from the FreeNas box, put it in the computer I'm working on and booted FreeNas. I was able to get in and saw the FreeNas saw a pool, but it stated that 2.7TB were empty, which is not right.

Here is what I know. If I copied the original data to the FreeNas pool, it would have been setup for disk1 to be mirrored to disk2, so I don't think I destroyed any parity information during the clone. I don't think disk2 had any data, unless the partition was damaged and it stated it was empty when it wasn't. I have the original FreeNas box, but at this point, I don't remember which SATA port each drive was plugged in to (if that makes a difference). I REALLY would like to get this data as it is pictures of my wedding and when we were dating. If I need to leave this to a professional, please recommend someone and tell me what I need to tell them (is my zfs file system invalid?).

by Kyle Harvey at August 22, 2016 06:17 AM



how to use promise function in javascript functional function like forEach reduce?

I am using promise function like this:

let res = {approveList: [], rejectList: [], errorId: rv.errorId, errorDesc: rv.errorDesc};
for (let i = 0; i < rv.copyDetailList.length; i ++) {
    const item = rv.copyDetailList[i];
    const v = await convertCommonInfo(item);
    if (!item.errorId) {
    } else {
        res.rejectList.push(merge(v, {errorId: item.errorId, errorDesc: item.errorMsg}));

This works well, but I want to try to use some functional function, I find that I have to use map then reduce

// WORK, But with two traversal
const dataList = await promise.all( => convertCommonInfo(item)));

const res = dataList.reduce((obj, v, i) => {
    const item = rv.copyDetailList[i];
    if (!item.errorId) {
    } else {
        obj.rejectList.push(merge(v, {errorId: item.errorId, errorDesc: item.errorMsg}));

    return obj;
}, {approveList: [], rejectList: [], errorId: rv.errorId, errorDesc: rv.errorDesc});

I find that forEach function can not work:

// NOT WORK, not wait async function
rv.copyDetailList.forEach(async function(item) {
    const v = await convertCommonInfo(item);
    if (!item.errorId) {
    } else {
        res.rejectList.push(merge(v, {errorId: item.errorId, errorDesc: item.errorMsg}));

This doesn't work, it just return init value. In fact this puzzle me, sine I await the function, why not work?

Even I want to use reduce function:

// NOT WORK, Typescirpt can not compile
rv.copyDetailList.reduce(async function(prev, item) {
    const v = await convertCommonInfo(item);
    if (!item.errorId) {
    } else {
        prev.rejectList.push(merge(v, {errorId: item.errorId, errorDesc: item.errorMsg}));
}, res);

But since I am using Typescript, I got error like this:

error TS2345: Argument of type '(prev: { approveList: any[]; rejectList: any[]; errorId: string; errorDesc: string; }, item: Resp...' is not assignable to parameter of type '(previousValue: { approveList: any[]; rejectList: any[]; errorId: string; errorDesc: string; }, c...'.
Type 'Promise<void>' is not assignable to type '{ approveList: any[]; rejectList: any[]; errorId: string; errorDesc: string; }'.
Property 'approveList' is missing in type 'Promise<void>'.

So I want to know two things:

  1. Why forEach await can not work?
  2. Can I use promise function in reduce?

by roger at August 22, 2016 05:49 AM

model measurement in scikit learn

Working on the logistic regression, and post code sample and document. Wondering if there is any built-in API in scikit learn which calculate model measurement, like precision, recall, AUC, etc. for any model prediction results? Thanks.

Working with Python 2.7.

# -*- coding: utf-8 -*-

Logistic Regression 3-class Classifier

Show below is a logistic-regression classifiers decision boundaries on the
`iris <>`_ dataset. The
datapoints are colored according to their labels.


# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets

# import some data to play with
iris = datasets.load_iris()
X =[:, :2]  # we only take the first two features.
Y =

h = .02  # step size in the mesh

logreg = linear_model.LogisticRegression(C=1e5)

# we create an instance of Neighbours Classifier and fit the data., Y)

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z,

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k',
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

regards, Lin

by Lin Ma at August 22, 2016 05:46 AM

Planet Emacsen


How to express parsing logic in Parsec ParserT monad

I was working on "Write Yourself a Scheme in 48 hours" to learn Haskell and I've run into a problem I don't really understand. It's for question 2 from the exercises at the bottom of this section.

The task is to rewrite

import Text.ParserCombinators.Parsec
parseString :: Parser LispVal
parseString = do
                char '"'
                x <- many (noneOf "\"")
                char '"'
                return $ String x

such that quotation marks which are properly escaped (e.g. in "This sentence \" is nonsense") get accepted by the parser.

In an imperative language I might write something like this (roughly pythonic pseudocode):

def parseString(input): 
  if input[0] != "\"" or input[len(input)-1] != "\"":
    return error
  input = input[1:len(input) - 1] # slice off quotation marks  
  output = "" # This is the 'zero' that accumulates over the following loop
  # If there is a '"' in our string we want to make sure the previous char
  # was '\'  

  for n in range(len(input)):
    if input[n] == "\"":
        if input[n - 1] != "\\":
          return error
      catch IndexOutOfBoundsError:
        return error
    output += input[n]
  return output

I've been looking at the docs for Parsec and I just can't figure out how to work this as a monadic expression.

I got to this:

parseString :: Parser LispVal
parseString = do
                char '"'
                regular <- try $ many (noneOf "\"\\")
                quote <- string "\\\""
                char '"'
                return $ String $ regular ++ quote

But this only works for one quotation mark and it has to be at the very end of the string--I can't think of a functional expression that does the work that my loops and if-statements do in the imperative pseudocode.

I appreciate you taking your time to read this and give me advice.

by lachrimae at August 22, 2016 05:31 AM

numpy reshape confusion with negative shape values

Always confused how numpy reshape handle negative shape parameter, here is an example of code and output, could anyone explain what happens for reshape [-1, 1] here? Thanks.

Related document, using Python 2.7.

import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

S = np.array(['box','apple','car'])
le = LabelEncoder()
S = le.fit_transform(S)
ohe = OneHotEncoder()
one_hot = ohe.fit_transform(S.reshape(-1,1)).toarray()

[1 0 2]
[[ 0.  1.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]]

by Lin Ma at August 22, 2016 05:03 AM

Seperating a tree Regression Model based on unique values of one column

I have a data set of 20,000,000 rows. Each row has 30 columns.

One of the columns contains 7000 unique Product Numbers.

Each row contains a Unit Cost value that I would like to predict using all the columns other than the Unit Cost.

I would like to build a unique decision tree or a unique branch of a decision tree to model the data for each Product Number.

Basically partitioning the rows for each Product Number and modelling each Product Number in isolation.

I would like to train a single model in Azure to do this if possible. Any suggestions?

by Ai Inspec at August 22, 2016 04:48 AM


Finding degree two subfield

Let $K=\frac{\mathbb{Q}[x]}{<f(x)>}$ where $f(x)$ is irreducible over $\mathbb{Q}$ and has even degree. I want to find $K_2$ such that $ \mathbb{Q} \subseteq K_2\subseteq K$ and $[K_2:\mathbb{Q}]=2$.

If K is a Galois extension over $\mathbb{Q}$ then discriminant of $f(x)$ solves the problem.

But what if $K$ is not a galois extension?

by xyz at August 22, 2016 04:40 AM


OneHotEncoder confusion in scikit learn

Using in Python 2.7 (miniconda interpreter). Confused by the example below about OneHotEncoder, confused why enc.n_values_ output is [2, 3, 4]? If anyone could help to clarify, it will be great.

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>>[[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])  
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
       handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

regards, Lin

by Lin Ma at August 22, 2016 04:12 AM




Borrower, platform, SPV relationship with borrower payment dependent notes?

As you know, real estate crowd funding platforms are taking off at the moment. Platforms connect investors to real estate assets. I am having an issue with understanding how exactly borrower payment dependent notes work. I simply want more details about the SPVs involved. Is the underlying loan simply a loan and not a debt security? Does the platform directly lend the money or do they do it through a different SPV? And then the platform creates a different SPV to which it issues these borrower payment dependent notes? Would the investors then be buying a pro rata share of the SPV? Thank you so much.

by user3138766 at August 22, 2016 03:36 AM



How to build a simple computing engine for Hadoop?

I am a student with some basic knowledge. I want to build a SIMPLE (minimum dependencies) computing engine for Hadoop 2 for my research, exactly similar to Apache Tez. But I don't know how to do it. Could you guys show me some steps I need to take to achieve my goal?

  • What should I learn first? From basic to advanced
  • What book I need to read?
  • Tutorial links: blogs, youtube,...

[Optional] Could you show me steps from basic to advanced to optimize my engine? (If I can finish the simple version)

I don't care about how long it takes, I just want to learn.


by hminle at August 22, 2016 02:40 AM



volatility of a mid curve option


When checking the volatility surface for, let's say, a swaption, where the the option expires in 1Y and the underlying starts in 1Y and ends in 5Y, one would check the volatility surface for the quoted volatilities and pick the volatility from Exp. 1Yx5Y ;

What happens to the volatility of a mid curve option? how do you relate/ interpolate the volatility in this case? let's say the option expires in 1Y, and the asset starts in 6Y and ends in 5Y after start? where on the volatility surface should the volatility of a mid curve option be situated? Or in other words howw do you get the volatility for the 6Y fwd 5Y swap for an option that expires in 1Y ?

by Kriska at August 22, 2016 02:08 AM


Understanding LEADING and TRAILING operations of an operator precedence grammar

I want to understand what the LEADING and TRAILING of non-terminal in an operator precedence grammar physically mean.

I am confused by the various definitions I have read on them.
I understand that the LEADING of a non-terminal is the first terminal which can be present in it's derivation.
On the other hand, the TRAILING of a non-terminal is the last terminal which can be present in it's derivation.

In the following example:

E   ->  E   +   T      -- I
E   ->  T              -- II
T   ->  T   *   F      -- III
T   ->  F              -- IV
F   ->  (   E   )      -- V
F   ->  id             -- VI

By my understanding,

LEADING(E) = { +, *, (, id }
LEADING(T) = { *, (, id }
LEADING(F) = { (, id }

This turns out fine, but my problem is in the TRAILING.

TRAILING(F) = { id, ) }
TRAILING(T) = TRAILING(F) = { id, ) }          -- (1)
TRAILING(E) = TRAILING(T) = { id, ) }          -- (2)

Reason for (2) is that according to productions I and II, the last terminal of the derivation of E will be last terminals in the derivation of T. Hence, TRAILING(E) = TRAILING(T).

Unfortunately the solution to this problem states:

TRAILING(F) = { id, ) }
TRAILING(T) = TRAILING(F) `union` { * } = { *, id, ) }
TRAILING(E) = TRAILING(T) `union` { + } = { +, *, id, ) }

I don't see how * or + can be the last terminals in the derivation of E. Any derivation of E will always end with either an id or ). Similarly, case for T.

by Likhit at August 22, 2016 02:06 AM


Is there a terminology for these concepts?

In $P/poly$ essentially we want to find polynomial sized constants that will help solve problems of some fixed length in polynomial time.

If a problem is $NP$ complete then it has a short certificate for YES instances. If NO instances also have short certificates that can be verified in polynomial time then $NP=coNP$. Assume we live where $NP\neq coNP$, $NP\cup coNP\subsetneq P/poly$ holds.

Then there are no short proofs for NO instances.

However could there be a scenario where we may have short certificates for every length $n$ NO instances of NP complete problems that can be verified using a polynomial circuit of size $n^c$ (which may take exponential time to compute)?

What if we seek a scenario where we may have short certificates for NO instances that can be verified by a randomized algorithm which runs in polynomial time with probability $2/3$ at any length?

What if there is a randomized algorithm which runs in polynomial time which refutes NO instances with probability $1/2+1/2^{{n}^\alpha}$ where $\alpha\in(0,1)$ holds at fixed input length $n$?

Is such scenarios possible and if so is there a terminology for these concepts?

We do not worry about YES instances.

For example in coding theory if we seek 'is there a codeword of weight < w?' a minimum weight codeword will be the standard short certificate that can be verified in polynomial time for YES instances while for NO instances we can have a scenario where short certificates can be verified in P/poly, BPP or in PP.

by Turbo at August 22, 2016 01:40 AM

arXiv Networking and Internet Architecture

A Concise Forwarding Information Base for Scalable and Fast Flat Name Switching. (arXiv:1608.05699v1 [cs.NI])

Forwarding information base (FIB) scalability is a fundamental problem of numerous new network architectures that propose to use location-independent network names. We propose Concise, a FIB design that uses very little memory to support fast query of a large number of location-independent names. Concise makes use of minimal perfect hashing and relies on the SDN framework and supports fast name classification. Our conceptual contribution of Concise is to optimize the memory efficiency and query speed in the data plane and move the relatively complex construction and update components to the resource-rich control plane. We implemented Concise on three platforms. Experimental results show that Concise uses significantly smaller memory to achieve faster query speed compared to existing FIBs for flat name switching.

by <a href="">Ye Yu</a>, <a href="">Djamal Belazzougui</a>, <a href="">Chen Qian</a>, <a href="">Qin Zhang</a> at August 22, 2016 01:30 AM

Automata Theory Approach to Predicate Intuitionistic Logic. (arXiv:1608.05698v1 [cs.LO])

Predicate intuitionistic logic is a well established fragment of dependent types. According to the Curry-Howard isomorphism proof construction in the logic corresponds well to synthesis of a program the type of which is a given formula. We present a model of automata that can handle proof construction in full intuitionistic first-order logic. The automata are constructed in such a way that any successful run corresponds directly to a normal proof in the logic. This makes it possible to discuss formal languages of proofs or programs, the closure properties of the automata and their connections with the traditional logical connectives.

by <a href="">Maciej Zielenkiewicz</a>, <a href="">Aleksy Schubert</a> at August 22, 2016 01:30 AM

Revisiting Reuse in Main Memory Database Systems. (arXiv:1608.05678v1 [cs.DB])

Reusing intermediates in databases to speed-up analytical query processing has been studied in the past. Existing solutions typically require intermediate results of individual operators to be materialized into temporary tables to be considered for reuse in subsequent queries. However, these approaches are fundamentally ill-suited for use in modern main memory databases. The reason is that modern main memory DBMSs are typically limited by the bandwidth of the memory bus, thus query execution is heavily optimized to keep tuples in the CPU caches and registers. To that end, adding additional materialization operations into a query plan not only add additional traffic to the memory bus but more importantly prevent the important cache- and register-locality opportunities resulting in high performance penalties.

In this paper we study a novel reuse model for intermediates, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations. We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. As queries arrive, our reuse-aware optimizer reasons about the reuse opportunities for hash tables, employing cost models that take into account hash table statistics together with the CPU and data movement costs within the cache hierarchy. Experimental results, based on our HashStash prototype demonstrate performance gains of $2\times$ for typical analytical workloads with no additional overhead for materializing intermediates.

by <a href="">Kayhan Dursun</a>, <a href="">Carsten Binnig</a>, <a href="">Ugur Cetintemel</a>, <a href="">Tim Kraska</a> at August 22, 2016 01:30 AM

Hierarchical Shape Abstraction for Analysis of Free-List Memory Allocators. (arXiv:1608.05676v1 [cs.PL])

We propose a hierarchical abstract domain for the analysis of free-list memory allocators that tracks shape and numerical properties about both the heap and the free lists. Our domain is based on Separation Logic extended with predicates that capture the pointer arithmetics constraints for the heap-list and the shape of the free-list. These predicates are combined using a hierarchical composition operator to specify the overlapping of the heap-list by the free-list. In addition to expressiveness, this operator leads to a compositional and compact representation of abstract values and simplifies the implementation of the abstract domain. The shape constraints are combined with numerical constraints over integer arrays to track properties about the allocation policies (best-fit, first-fit, etc). Such properties are out of the scope of the existing analyzers. We implemented this domain and we show its effectiveness on several implementations of free-list allocators.

by <a href="">Bin Fang</a>, <a href="">Mihaela Sighireanu</a> at August 22, 2016 01:30 AM

lpopt: A Rule Optimization Tool for Answer Set Programming. (arXiv:1608.05675v2 [cs.LO] UPDATED)

State-of-the-art answer set programming (ASP) solvers rely on a program called a grounder to convert non-ground programs containing variables into variable-free, propositional programs. The size of this grounding depends heavily on the size of the non-ground rules, and thus, reducing the size of such rules is a promising approach to improve solving performance. To this end, in this paper we announce lpopt, a tool that decomposes large logic programming rules into smaller rules that are easier to handle for current solvers. The tool is specifically tailored to handle the standard syntax of the ASP language (ASP-Core) and makes it easier for users to write efficient and intuitive ASP programs, which would otherwise often require significant hand-tuning by expert ASP engineers. It is based on an idea proposed by Morak and Woltran (2012) that we extend significantly in order to handle the full ASP syntax, including complex constructs like aggregates, weak constraints, and arithmetic expressions. We present the algorithm, the theoretical foundations on how to treat these constructs, as well as an experimental evaluation showing the viability of our approach.

by <a href="">Manuel Bichler</a>, <a href="">Michael Morak</a>, <a href="">Stefan Woltran</a> at August 22, 2016 01:30 AM

POLYPATH: Supporting Multiple Tradeoffs for Interaction Latency. (arXiv:1608.05654v1 [cs.OS])

Modern mobile systems use a single input-to-display path to serve all applications. In meeting the visual goals of all applications, the path has a latency inadequate for many important interactions. To accommodate the different latency requirements and visual constraints by different interactions, we present POLYPATH, a system design in which application developers (and users) can choose from multiple path designs for their application at any time. Because a POLYPATH system asks for two or more path designs, we present a novel fast path design, called Presto. Presto reduces latency by judiciously allowing frame drops and tearing.

We report an Android 5-based prototype of POLYPATH with two path designs: Android legacy and Presto. Using this prototype, we quantify the effectiveness, overhead, and user experience of POLYPATH, especially Presto, through both objective measurements and subjective user assessment. We show that Presto reduces the latency of legacy touchscreen drawing applications by almost half; and more importantly, this reduction is orthogonal to that of other popular approaches and is achieved without any user-noticeable negative visual effect. When combined with touch prediction, Presto is able to reduce the touch latency below 10 ms, a remarkable achievement without any hardware support.

by <a href="">Min Hong Yun</a>, <a href="">Songtao He</a>, <a href="">Lin Zhong</a> at August 22, 2016 01:30 AM

Polynomial Kernels and Wideness Properties of Nowhere Dense Graph Classes. (arXiv:1608.05637v1 [cs.DM])

Nowhere dense classes of graphs are very general classes of uniformly sparse graphs with several seemingly unrelated characterisations. From an algorithmic perspective, a characterisation of these classes in terms of uniform quasi-wideness, a concept originating in finite model theory, has proved to be particularly useful. Uniform quasi-wideness is used in many fpt-algorithms on nowhere dense classes. However, the existing constructions showing the equivalence of nowhere denseness and uniform quasi-wideness imply a non-elementary blow up in the parameter dependence of the fpt-algorithms, making them infeasible in practice.

As a first main result of this paper, we use tools from logic, in particular from a subfield of model theory known as stability theory, to establish polynomial bounds for the equivalence of nowhere denseness and uniform quasi-wideness.

A powerful method in parameterized complexity theory is to compute a problem kernel in a pre-computation step, that is, to reduce the input instance in polynomial time to a sub-instance of size bounded in the parameter only (independently of the input graph size). Our new tools allow us to obtain for every fixed value of $r$ a polynomial kernel for the distance-$r$ dominating set problem on nowhere dense classes of graphs. This result is particularly interesting, as it implies that for every class $\mathcal{C}$ of graphs which is closed under subgraphs, the distance-$r$ dominating set problem admits a kernel on $\mathcal{C}$ for every value of $r$ if, and only if, it admits a polynomial kernel for every value of $r$ (under the standard assumption of parameterized complexity theory that $\mathrm{FPT} \neq W[2]$).

by <a href="">Stephan Kreutzer</a>, <a href="">Roman Rabinovich</a>, <a href="">Sebastian Siebertz</a> at August 22, 2016 01:30 AM

Symbolic Abstract Contract Synthesis in a Rewriting Framework. (arXiv:1608.05619v1 [cs.PL])

We propose an automated technique for inferring software contracts from programs that are written in a non-trivial fragment of C, called KernelC, that supports pointer-based structures and heap manipulation. Starting from the semantic definition of KernelC in the K framework, we enrich the symbolic execution facilities recently provided by K with novel capabilities for assertion synthesis that are based on abstract subsumption. Roughly speaking, we define an abstract symbolic technique that explains the execution of a (modifier) C function by using other (observer) routines in the same program. We implemented our technique in the automated tool KindSpec 2.0, which generates logical axioms that express pre- and post-condition assertions by defining the precise input/output behaviour of the C routines.

by <a href="">Mar&#xed;a Alpuente</a>, <a href="">Daniel Pardo</a>, <a href="">Alicia Villanueva</a> at August 22, 2016 01:30 AM

CurryCheck: Checking Properties of Curry Programs. (arXiv:1608.05617v1 [cs.PL])

We present CurryCheck, a tool to automate the testing of programs written in the functional logic programming language Curry. CurryCheck executes unit tests as well as property tests which are parameterized over one or more arguments. In the latter case, CurryCheck tests these properties by systematically enumerating test cases so that, for smaller finite domains, CurryCheck can actually prove properties. Unit tests and properties can be defined in a Curry module without being exported. Thus, they are also useful to document the intended semantics of the source code. Furthermore, CurryCheck also supports the automated checking of specifications and contracts occurring in source programs. Hence, CurryCheck is a useful tool that contributes to the property- and specification-based development of reliable and well tested declarative programs.

by <a href="">Michael Hanus</a> at August 22, 2016 01:30 AM

A short-key one-time pad cipher. (arXiv:1608.05613v1 [cs.CR])

A process for the secure transmission of data is presented that has to a certain degree the advantages of the one-time pad (OTP) cipher, that is, simplicity, speed, and information-theoretically security, but overcomes its fundamental weakness, the necessity of securely exchanging a key that is as long as the message. For each transmission, a dedicated one-time pad is generated for encrypting and decrypting the plaintext message. This one-time pad is built from a randomly chosen set of basic keys taken from a public library. Because the basic keys can be chosen and used multiple times, the method is called multiple-time pad (MTP) cipher. The information on the choice of basic keys is encoded in a short keyword that is transmitted by secure means. The process is made secure against known-plaintext attack by additional design elements. The process is particularly useful for high-speed transmission of mass data and video or audio streaming.

by <a href="">Uwe Starossek</a> at August 22, 2016 01:30 AM

On Joining Graphs. (arXiv:1608.05594v1 [cs.DB])

In the graph database literature the term "join" does not refer to an operator used to merge two graphs. In particular, a counterpart of the relational join is not present in existing graph query languages, and consequently no efficient algorithms have been developed for this operator.

This paper provides two main contributions. First, we define a binary graph join operator that acts on the vertices as a standard relational join and combines the edges according to a user-defined semantics. Then we propose the "CoGrouped Graph Conjunctive $\theta$-Join" algorithm running over data indexed in secondary memory. Our implementation outperforms the execution of the same operation in Cypher and SPARQL on major existing graph database management systems by at least one order of magnitude, also including indexing and loading time.

by <a href="">Giacomo Bergami</a>, <a href="">Matteo Magnani</a>, <a href="">Danilo Montesi</a> at August 22, 2016 01:30 AM

Logical Data Independence in the 21st Century -- Co-Existing Schema Versions with InVerDa. (arXiv:1608.05564v1 [cs.DB])

We present InVerDa, a tool for end-to-end support of co-existing schema versions within one database. While it is state of the art to run multiple versions of a continuously developed application concurrently, the same is hard for databases. In order to keep multiple co-existing schema versions alive, that all access the same data set, developers usually employ handwritten delta code (e.g. views and triggers in SQL). This delta code is hard to write and hard to maintain: if a database administrator decides to adapt the physical table schema, all handwritten delta code needs to be adapted as well, which is expensive and error-prone in practice. With InVerDa, developers use a simple bidirectional database evolution language in the first place that carries enough information to generate all the delta code automatically. Without additional effort, new schema versions become immediately accessible and data changes in any version are visible in all schema versions at the same time. We formally validate the correctness of this propagation. InVerDa also allows for easily changing the physical table designs without affecting the availability of co-existing schema versions. This greatly increases robustness (264 times less lines of code) and allows for significant performance optimization.

by <a href="">Kai Herrmann</a>, <a href="">Hannes Voigt</a>, <a href="">Andreas Behrend</a>, <a href="">Jonas Rausch</a>, <a href="">Wolfgang Lehner</a> at August 22, 2016 01:30 AM

Relationship between the Reprogramming Determinants of Boolean Networks and their Interaction Graph. (arXiv:1608.05552v1 [cs.DM])

In this paper, we address the formal characterization of targets triggering cellular trans-differentiation in the scope of Boolean networks with asynchronous dynamics. Given two fixed points of a Boolean network, we are interested in all the combinations of mutations which allow to switch from one fixed point to the other, either possibly, or inevitably. In the case of existential reachability, we prove that the set of nodes to (permanently) flip are only and necessarily in certain connected components of the interaction graph. In the case of inevitable reachability, we provide an algorithm to identify a subset of possible solutions.

by <a href="">Hugues Mandon</a>, <a href="">Stefan Haar</a>, <a href="">Lo&#xef;c Paulev&#xe9;</a> at August 22, 2016 01:30 AM

Goal-Oriented Reduction of Automata Networks. (arXiv:1608.05548v1 [cs.LO])

We consider networks of finite-state machines having local transitions conditioned by the current state of other automata. In this paper, we depict a reduction procedure tailored for a given reachability property of the form ``from global state s there exists a sequence of transitions leading to a state where an automaton g is in a local state T'. By exploiting a causality analysis of the transitions within the individual automata, the proposed reduction removes local transitions while preserving all the minimal traces that satisfy the reachability property. The complexity of the procedure is polynomial in the total number of local states and transitions, and exponential in the number of local states within one automaton. Applied to automata networks modelling dynamics of biological systems, we observe that the reduction shrinks down significantly the reachable state space, enhancing the tractability of the model-checking of large networks.

by <a href="">Lo&#xef;c Paulev&#xe9;</a> at August 22, 2016 01:30 AM

A Survey on Routing in Anonymous Communication Protocols. (arXiv:1608.05538v1 [cs.CR])

The Internet has undergone dramatic changes in the past 15 years, and now forms a global communication platform that billions of users rely on for their daily activities. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy, ranging from profiling of users for monetizing personal information to nearly omnipotent governmental surveillance. As a result, public interest in systems for anonymous communication has drastically increased. Several such systems have been proposed in the literature, each of which offers anonymity guarantees in different scenarios and under different assumptions, reflecting the plurality of approaches for how messages can be anonymously routed to their destination. Understanding this space of competing approaches with their different guarantees and assumptions is vital for users to understand the consequences of different design options.

In this work, we survey previous research on designing, developing, and deploying systems for anonymous communication. To this end, we provide a taxonomy for clustering all prevalently considered approaches (including Mixnets, DC-nets, onion routing, and DHT-based protocols) with respect to their unique routing characteristics, deployability, and performance. This, in particular, encompasses the topological structure of the underlying network; the routing information that has to be made available to the initiator of the conversation; the underlying communication model; and performance-related indicators such as latency and communication layer. Our taxonomy and comparative assessment provide important insights about the differences between the existing classes of anonymous communication protocols, and it also helps to clarify the relationship between the routing characteristics of these protocols, and their performance and scalability.

by <a href="">Fatemeh Shirazi</a>, <a href="">Milivoj Simeonovski</a>, <a href="">Muhammad Rizwan Asghar</a>, <a href="">Michael Backes</a>, <a href="">Claudia Diaz</a> at August 22, 2016 01:30 AM

Private and Truthful Aggregative Game for Large-Scale Spectrum Sharing. (arXiv:1608.05537v1 [cs.GT])

Thanks to the rapid development of information technology, the size of the wireless network becomes larger and larger, which makes spectrum resources more precious than ever before. To improve the efficiency of spectrum utilization, game theory has been applied to study the spectrum sharing in wireless networks for a long time. However, the scale of wireless network in existing studies is relatively small. In this paper, we introduce a novel game and model the spectrum sharing problem as an aggregative game for large-scale, heterogeneous, and dynamic networks. The massive usage of spectrum also leads to easier privacy divulgence of spectrum users' actions, which calls for privacy and truthfulness guarantees in wireless network. In a large decentralized scenario, each user has no priori about other users' decisions, which forms an incomplete information game. A "weak mediator", e.g., the base station or licensed spectrum regulator, is introduced and turns this game into a complete one, which is essential to reach a Nash equilibrium (NE). By utilizing past experience on the channel access, we propose an online learning algorithm to improve the utility of each user, achieving NE over time. Our learning algorithm also provides no regret guarantee to each user. Our mechanism admits an approximate ex-post NE. We also prove that it satisfies the joint differential privacy and is incentive-compatible. Efficiency of the approximate NE is evaluated, and the innovative scaling law results are disclosed. Finally, we provide simulation results to verify our analysis.

by <a href="">Pan Zhou</a>, <a href="">Wenqi Wei</a>, <a href="">Kaigui Bian</a>, <a href="">Dapeng Oliver Wu</a>, <a href="">Yuchong Hu</a> at August 22, 2016 01:30 AM

Towards Reversible Computation in Erlang. (arXiv:1608.05521v1 [cs.PL])

In a reversible language, any forward computation can be undone by a finite sequence of backward steps. Reversible computing has been studied in the context of different programming languages and formalisms, where it has been used for debugging and for enforcing fault-tolerance, among others. In this paper, we consider a subset of Erlang, a concurrent language based on the actor model. We formally introduce a reversible semantics for this language. To the best of our knowledge, this is the first attempt to define a reversible semantics for Erlang.

by <a href="">Naoki Nishida</a>, <a href="">Adri&#xe1;n Palacios</a>, <a href="">Germ&#xe1;n Vidal</a> at August 22, 2016 01:30 AM

Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking. (arXiv:1608.05493v1 [cs.NI])

This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms.

by <a href="">Hiroyuki Kasai</a>, <a href="">Wolfgang Kellerer</a>, <a href="">Martin Kleinsteuber</a> at August 22, 2016 01:30 AM

Efficient Computation of Slepian Functions on the Sphere. (arXiv:1608.05479v1 [cs.DM])

In this work, we develop a new method for the fast and memory-efficient computation of Slepian functions on the sphere. Slepian functions, which arise as the solution of the Slepian concentration problem on the sphere, have desirable properties for applications where measurements are only available within a spatially limited region on the sphere and/or a function is required to be analyzed over the spatially limited region. Slepian functions are currently not easily computed for large band-limits (L > 100) for an arbitrary spatial region due to high computational and large memory storage requirements. For the special case of a polar cap, the symmetry of the region enables the decomposition of the Slepian concentration problem into smaller sub-problems and consequently the efficient computation of Slepian functions for large band-limits. By exploiting the efficient computation of Slepian functions for the polar cap region on the sphere, we develop a formulation, supported by a fast algorithm, for the computation of Slepian functions for an arbitrary spatial region to enable the analysis of modern data-sets that support large band-limits. For the proposed algorithm, we carry out accuracy analysis, computational complexity analysis and review of memory storage requirements. We illustrate, through numerical experiments, that the proposed method enables faster computation, and has smaller storage requirements, while allowing for sufficiently accurate computation of the Slepian functions.

by <a href="">Alice P. Bates</a>, <a href="">Zubair Khalid</a>, <a href="">Rodney A. Kennedy</a> at August 22, 2016 01:30 AM

Papers presented at the 32nd International Conference on Logic Programming (ICLP 2016). (arXiv:1608.05440v1 [cs.PL])

This is the list of the full papers accepted for presentation at the 32nd International Conference on Logic Programming, New York City, USA, October 18-21, 2016.

In addition to the main conference itself, ICLP hosted four pre-conference workshops, the Autumn School on Logic Programing, and a Doctoral Consortium.

The final versions of the full papers will be published in a special issue of the journal Theory and Practice of Logic Programming (TPLP). We received eighty eight abstract submissions, of which twenty seven papers were accepted for publication as TPLP rapid communications.

Papers deemed of sufficiently high quality to be presented as the conference, but not enough to be appear in TPLP, will be published as Technical Communications in the OASIcs series. Fifteen papers fell into this category.

by <a href="">Manuel Carro</a>, <a href="">Andy King</a> (Eds.) at August 22, 2016 01:30 AM

Quantum Entanglement Distribution in Next-Generation Wireless Communication Systems. (arXiv:1608.05188v1 [quant-ph] CROSS LISTED)

In this work we analyze the distribution of quantum entanglement over communication channels in the millimeter-wave regime. The motivation for such a study is the possibility for next-generation wireless networks (beyond 5G) to accommodate such a distribution directly - without the need to integrate additional optical communication hardware into the transceivers. Future wireless communication systems are bound to require some level of quantum communications capability. We find that direct quantum-entanglement distribution in the millimeter-wave regime is indeed possible, but that its implementation will be very demanding from both a system-design perspective and a channel-requirement perspective.

by <a href="">Nedasadat Hosseinidehaj</a>, <a href="">Robert Malaney</a> at August 22, 2016 01:30 AM

Shortest unique palindromic substring queries in optimal time

Authors: Yuto Nakashima, Hiroe Inoue, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
Download: PDF
Abstract: A palindrome is a string that reads the same forward and backward. A palindromic substring $P$ of a string $S$ is called a shortest unique palindromic substring ($\mathit{SUPS}$) for an interval $[x, y]$ in $S$, if $P$ occurs exactly once in $S$, this occurrence of $P$ contains interval $[x, y]$, and every palindromic substring of $S$ which contains interval $[x, y]$ and is shorter than $P$ occurs at least twice in $S$. The $\mathit{SUPS}$ problem is, given a string $S$, to preprocess $S$ so that for any subsequent query interval $[x, y]$ all the $\mathit{SUPS}\mbox{s}$ for interval $[x, y]$ can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string $S$ of length $n$ in $O(n)$ time and space so that all $\mathit{SUPS}\mbox{s}$ for any subsequent query interval can be answered in $O(k+1)$ time, where $k$ is the number of outputs.

August 22, 2016 01:04 AM

Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++

Authors: Timo Bingmann, Michael Axtmann, Emanuel Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Sebastian Schlag, Matthias Stumpp, Tobias Sturm, Peter Sanders
Download: PDF
Abstract: We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping). We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction

August 22, 2016 01:00 AM


Efficiently storing real-time intraday data in an application agnostic way

What would be the best approach to handle real-time intraday data storage?

For personal research I've always imported from flat files only into memory (historical EOD), so I don't have much experience with this. I'm currently working on a side project, which would require daily stock quotes updated every minute from an external feed. For the time being, I suppose any popular database solution should handle it without sweating too much in this scenario. But I would like the adopted solution to scale easily when real-time ticks become necessary.

A similar problem has been mentioned by Marko, though it was mostly specific to R. I'm looking for a universal data storage accessible both for lightweight web front-ends (PHP/Ruby/Flex) and analytical back-end (C++, R or Python, don't know yet).

From what chrisaycock mentioned column oriented databases should be the most viable solution. And it seems to be the case.

But I'm not sure I understand all the intricacies of column oriented storage in some exemplary usage scenarios:

  • Fetching all or subset of price data for a specific ticker for front-end charting
    • Compared to row based solutions fetching price data should be faster because it's a sequential read. But how does storing multiple tickers in one place influence this? For example a statement like "select all timestamps and price data where ticker is equal to something". Don't I have to compare the ticker on every row I fetch? And in the situation where I have to provide complete data for some front-end application, wouldn't serving a raw flat file for the instrument requested be more efficient?
  • Analytics performed in the back-end
    • Things like computing single values for a stock (e.g. variance, return for last x days) and dependent time-series (daily returns, technical indicators etc.). Fetching input data for computations should be more efficient as in the preceding case, but what about writing? The gain I see is bulk writing the final result (like value of computed indicator for every timestamp), but still I don't know how the database handles my mashup of different tickers in one table. Does horizontal partitioning/sharding handle it for me automatically or am I better splitting manually into table per instrument structure (which seems unnecessary cumbersome)?
  • Updating the database with new incoming ticks
    • Using row based orientation would be more efficient here, wouldn't it? And the same goes about updating aggregated data (for example daily OHLC tables). Won't it be a possible bottleneck?

All this is in the context of available open source solutions. I thought initially about InfiniDB or HBase, but I've seen MonetDB and InfoBright being mentioned around here too. I don't really need "production quality" (at least not yet) as mentioned by chrisaycock in the referenced question, so would any of this be a better choice than the others?

And the last issue - from approximately which load point are specialized time-series databases necessary? Unfortunately, things like kdb+ or FAME are out of scope in this case, so I'm contemplating how much can be done on commodity hardware with standard relational databases (MySQL/PostgreSQL) or key-value stores (like Tokyo/Kyoto Cabinet's B+ tree) - is it a dead end really? Should I just stick with some of the aforementioned column oriented solutions owing to the fact that my application is not mission critical or is even that an unnecessary precaution?

Thanks in advance for your input on this. If some part is too convoluted, let me know in a comment. I will try to amend accordingly.


It seems that strictly speaking HBase is not a column oriented store but rather a sparse, distributed, persistent multidimensional sorted map, so I've crossed it out from the original question.

After some research I'm mostly inclined towards InfiniDB. It has all the features I need, supports SQL (standard MySQL connectors/wrappers can be used for access) and full DML subset. The only thing missing in the open source edition is on the fly compression and scaling out to clusters. But I guess it's still a good bang for the buck, considering it's free.

by Karol Piczak at August 22, 2016 12:55 AM


How to send audit logs with audisp-remote and receive them with netcat

I am trying to configure a CentOS 7 running in VirtualBox to send its audit logs to the host which is FreeBSD 10.3. Ideally, I'd like to receive the logs with FreeBSD's auditdistd(8) but for now I'd just like to be able to use netcat for that.

My problem is that netcat doesn't get any data.


  1. When I run service auditd status I get the following results:

    Redirecting to /bin/systemctl status  auditd.service
    auditd.service - Security Auditing Service
       Loaded: loaded (/usr/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
       Active: active (running) since Fri 2016-08-19 11:35:42 CEST; 3s ago
      Process: 2216 ExecStartPost=/sbin/augenrules --load (code=exited, status=1/FAILURE)
     Main PID: 2215 (auditd)
       CGroup: /system.slice/auditd.service
               ├─2215 /sbin/auditd -n
               └─2218 /sbin/audispd
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote was restarted
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote terminated unexpectedly
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote was restarted
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote terminated unexpectedly
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote was restarted
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote terminated unexpectedly
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote was restarted
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote terminated unexpectedly
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote has exceeded max_restarts
    Aug 19 11:35:42 hephaistos audispd[2218]: plugin /sbin/audisp-remote was restarted


Network Setup

  1. CentOS and FreeBSD are connected on a host-only network. I've assigned them the following IP's:

    • CentOS:
    • FreeBSD:

FreeBSD Setup

  1. I've got netcat listening on port 60:

    nc -lk 60

    The connection works. I can use nc 60 on CentOS to send data to FreeBSD.

CentOS Setup

  1. The kernel version is: 4.7.0-1.el7.elrepo.x86_64 #1 SMP Sun Jul 24 18:15:29 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux.
  2. The version of Linux Audit userspace is 2.6.6.
  3. auditd is running and actively logging to /var/log/audit.log.
  4. The auditing rules in /etc/audit/rules.d/ are well configured.
  5. The configuration of /etc/audisp/audisp-remote.conf looks like this:

    remote-server =
    port = 60
    local_port = any
    transport = tcp
    mode = immediate
  6. I've got two default files in /etc/audisp/plugins.d/: syslog.conf and af_unix.conf and both of them are not active. I've added af-remote.conf and it looks like this:

    # This file controls the audispd data path to the
    # remote event logger. This plugin will send events to
    # a remote machine (Central Logger).
    active = yes
    direction = out
    path = /sbin/audisp-remote
    type = always
    #args =
    format = string

    It is a modified example from the official repository (link).

  7. Here's the content of /etc/audisp/audispd.conf:

    q_depth = 150
    overflow_action = SYSLOG
    priority_boost = 4
    max_restarts = 10
    name_format = HOSTNAME

I'll be happy to provide more details if needed.

by Mateusz Piotrowski at August 22, 2016 12:55 AM


Reversing a list in another list in Haskell

I'm quite new to Haskell and I'm trying to reverse a list. At the same time I want to reverse the lists in that list. So for example:

Prelude> rev [[3,4,5],[7,5,2]]

I know that the following code reverses a list:

rev :: [[a]] -> [[a]]
rev [[]] = [[]]
rev [[x]] = [[x]]
rev xs = last xs : reverse (init xs)

I have been struggling for while, I have made some additions to the code but it still isn't working and I'm stuck.

rev :: [[a]] -> [[a]]
rev [[]] = [[]]
rev [[x]] = [[x]]
rev xs = last xs : reverse (init xs)
rev [xs] = last [xs] : reverse (init [xs])

I'd appreciate any help. Thanks in advance.

by ZCoder at August 22, 2016 12:49 AM



ConvNet layers not showing activity and dropping to zero after a few minibatches

I am attempting to train a convolutional neural network using Tensorflow. The structure of my network is similar to VGG except smaller - I'm using 3 pooling layers, 2 fully connected, and 252 target classes. I am using several layers of abstraction to make the final graph more readable. Base conv2d is like this

def conv2d(inputs, n_out, kernel, step, **kwargs):
    name = kwargs.pop('name','conv')
    relu = kwargs.pop('relu', True)
    bias = kwargs.pop('bias', True)
    padding = kwargs.pop("padding", "SAME")

    channles_in = inputs.get_shape().as_list()[-1]

    with tf.variable_scope(name) as scope:
        weights = tf.get_variable('weights', [kernel, kernel, channles_in, n_out],
        convolve = tf.nn.conv2d(inputs, weights, [1, step, step, 1],
        if bias is True:
            bias_layer = tf.get_variable("biases", [n_out],
            convolve = tf.nn.bias_add(convolve, bias_layer)

        if relu is True:
            convolve = tf.nn.relu(convolve)

        return convolve

Affine layer

def affine(inputs, n_out, **kwargs):
    name = kwargs.pop("name", "affine")
    bias = kwargs.pop("bias", True)
    relu = kwargs.pop("relu", True)

    input_shape = inputs.get_shape().as_list()
    if len(input_shape) == 4:
        n_in = reduce(mul, input_shape[1:], 1)
        inputs = tf.reshape(inputs, shape=[-1, n_in])
        n_in = input_shape[-1]

    with tf.variable_scope(name) as scope:
        weights = tf.get_variable('weights', [n_in, n_out], 
        fc = tf.matmul(inputs, weights)

        if bias is True:
            bias_layer = tf.get_variable("biases", [n_out], 
            fc = tf.nn.bias_add(fc, bias_layer)

        if relu is True:
            fc = tf.nn.relu(fc)

        return fc

max pooling op

def max_pool(inputs, ksize, stride, **kwargs):
    name = kwargs.pop("name", "max_pool")
    padding = kwargs.pop("padding", "SAME")
    with tf.variable_scope(name):
        pool = tf.nn.max_pool(inputs, ksize=[1, ksize, ksize, 1], 
                          strides=[1, stride, stride, 1],
        return pool

batch normalization for input images

def batch_norm(inputs):
    with tf.variable_scope('batch_norm'):
        mean, var = tf.nn.moments(inputs, axes=[0, 1, 2])
        return tf.nn.batch_normalization(inputs, mean, var, 
                                     offset=0, scale=1.0, variance_epsilon=1e-6)

max pooling group (only using 2 convolutions per group)

def max_pool_group(name, inputs, n_out, conv_k=3, conv_s=1, pool_k=2, pool_s=2):
    with tf.variable_scope(name):
        conv1 = conv2d(inputs, n_out, conv_k, conv_s, name='conv1')
        conv2 = conv2d(conv1, n_out, conv_k, conv_s, name='conv2')
        pool = max_pool(conv2, pool_k, pool_s, name='pool1')
        variable_summaries(pool, name)
        return pool   

With all that, my model (minus the other Tensorflow boilerplate stuff) is defined as

inputs = tf.placeholder(tf.float32, name='input', shape=[batch_size, 224, 224, 3])
target = tf.placeholder(tf.float32, name='target', shape=[batch_size, n_classes])
learning_rate_ph = tf.placeholder(tf.float32, name='learning_rage', shape=[])

def model(inputs):
    normed = batch_norm(inputs)
    pool1 = max_pool_group('pool1', normed, 64)
    pool2 = max_pool_group('pool2', pool1, 128)
    pool3 = max_pool_group('pool3', pool2, 256)
    fc1 = affine(pool3, 2048, name='fc1')
    variable_summaries(fc1, 'fc1')

    fc2 = affine(fc1, 2048, name='fc2')
    variable_summaries(fc2, 'fc2')

    logits = affine(fc2, n_classes, relu=False, name='logits')
    variable_summaries(logits, 'logits')
    return logits

logits = model(inputs)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, target))
tf.scalar_summary('loss', loss)

optimizer = tf.train.GradientDescentOptimizer(learning_rate_ph).minimize(loss)
train_prediction = tf.nn.softmax(logits)

The issue I'm having is that after only a few minibatches, pool1 and pool2 drop to 0 and show no activity. You can see in the image that parameters are still being adjusted in other layers. I've checked the input images and they all check out, labels all line up, and I have a scheme to drop the learning rate after every 30 epochs. I've changed the fully connected layer sizes between 2048 and 4096 and I've changed the depth of the network to look like VGG-D with the same response -- pooling layers converge to 0. Training on a handfull of images (2-10), the model will hit targets and the loss will drop but the layers converge to 0. What could I be doing wrong? Is there something obvious I'm missing?

Thanks in advance for your help.

by Andrew at August 22, 2016 12:30 AM


What data structure for this operation? [duplicate]

This question already has an answer here:

Say I have two arrays, { energy and score }, and they could have elements like this:

E: 1  3  4 7
S: 16 10 5 1

What I want is the best score with the best energy. What data structure can support inserting items in a way that I don't have have an item with less energy and less score than another item i.e. for any i,j where i!=j => score[i] > score[j] || energy[i] > energy[j]

When inserting, I do three steps:
1- if any item has more or equal score and energy, return;
2- if any item has less or equal score and energy, remove this item;
3- insert the required item.

Here are some examples:

1- insert e=8, s=1. The arrays become:

E: 1  3  4 8
S: 16 10 5 1

2- insert e=5, s=6. The arrays becomes:

E: 1  3  5 8
S: 16 10 6 1

3- insert e=5, s=11. The arrays becomes :

E: 1  5  8
S: 16 11 1
      ^    (3,10) is removed because (5,11) has more energy and more score than it.

What data structure can support this in (hopefully) O(logn) time?

by Ayman El Temsahi at August 22, 2016 12:25 AM


How to extract all the ticker symbols of an exchange with Quantmod in R?

I am using the Quantmod package in R for some data analysis. Now I can downbload price history of particular stocks or index with the following code:-

library(quantmod) # Loading quantmod library

getSymbols("^DJI", from = as.character(Sys.Date()-365*3))

I want to download all the ticker symbols that are composite of a particular Index such as DJI for example. What will be the best way to do that through R?

Thanks a lot in advance.

by Deb at August 22, 2016 12:13 AM


Multi-point evaluations of a polynomial mod p

Given a polynomial of degree $n$ modulo a prime number $p$, I want to evaluate that polynomial at multiple values of the variable $x$, what is the best way to do this?
I tried using Berlekamp's algorithm for factorization but it takes $O(n^3)$ just to factorize and then $O(n)$ per point evaluation. Is there any other way to bring the complexity down considerably like to $n\log(q)$ where $q$ is the number of points I want to evaluate the polynomial at? Or possibly polynomial time? All the coefficients and the values of $x$ that the polynomial is to be evaluated at lie between the $0$ and $prime - 1$, the prime is of order $10^6$.

by imanimefn at August 22, 2016 12:00 AM


HN Daily

August 21, 2016


Is this a viable method for testing market making strategies?

I found a video game market (steam community market) which allows for trading of in game items between users, most items are <0.25 USD each, and market capitalization appears to be maybe $5-$10 USD on some items. Something to be noted is the transaction fee is 15% which does limit the possibilities a bit.

One example item:

Some discoveries that should be taken into account:

Many items appear to have very consistent supply and demand, thus leaving them in a sort equilibrium thus the drift in price is very low.

One thing I did find while manually market making is occasionally a user will sell an item at a price lower than the bid, effectively making spreads go negative. In that case, the highest bidder then receives the item. This occurs on average about 1 out of 15 times a trade is cleared, but will go as long as 100 trades between opportunities on occasion. I created a program that provides constantly re-lists bid and offer quotes at the same price as soon as they are filled. By clearing very high volumes (over 1k items a day) I managed to turn $1 into $14 USD within 3 weeks. This isn't as good as it sound given the returns capped at about 5 bucks invested and I even got blocked from accessing the server after making 8k+ requests in an hour.

Given some less popular items have capitalizations of less than 5 USD it is possible for a dealer to accumulate almost all copies of the item in existence, allowing for the fixing of prices, perhaps useful for reducing volatility.

There is a one week holding period before the in game item is delivered to your inventory, this was implemented into the market before I attempted arbitrage based on negative spreads, and was why I stopped my high volume strategy. A one week delivery means a very large open interest is required in order provide liquidity 24/7.

Another thing, I have discovered is certain items are pretty much identical, but the amount they have been used in the game affects the value of the item and creates a spread, this could be used for correlation based stat-arb perhaps?

Modeling the market: For an item in equilibrium, the price will not drift much so the time series can be assumed as stationary.

Since It is easy to accumulate a massive open interest, at least compared to volume, one can assume for the sake of the model that there is unlimited buying and selling potential on behalf of the liquidity provider, and I will assume no one else is providing liquidity to the market and all other bids/offers are individuals seeking the utility of the item, rather than to make gains trading it.

$b$ will represent the dealers bid

$o$ will represent the dealers offer

$B_t$ will represent a process for the given item's market bid rate defined as $>= b$ with an unknown distribution.

$O_t$ will represent a process for the given item's market offer rate defined as $<= o$

$S_t$ can be defined as $O_t-B_t$

One Hypothesis I have, is that $B_t-b$ and $o-O_t$ are perhaps log normal distributed.

A method I propose for verifying the distribution is to sample the percent of negative spread arbitrage opportunities and compare them to the expected amount of opportunities a given distribution expects for the Process $S_t$ which could be used for fitting a distribution.

My question:

Can any research done into this in-game market be applied to real market making for financial markets, or is there factors not accounted for?

by FreepromTech at August 21, 2016 11:47 PM


Find ellipsoid that contains intersection of an ellipsoid and a hyperplane

I have an $n$-dimensional ellipsoid $E$ and a hyperplane $H$. This hyperplane cuts $E$ into two parts: $E_1$ and $E_2$ (whose disjoint union is $E$). I want to find another ellipsoid $E'$ that has minimal hyper-volume and contains $E_1$. Is there an efficient algorithm to do this?

My first thought was to formulate it as an optimization problem, but I am having difficulty with formulating it, as I don't know how to formulate the containment ($E_1 \subseteq E'$) constraint.

An approximation for the minimal hyper-volume ellipsoid is also good for my needs.

by Dudi Frid at August 21, 2016 11:46 PM

Higher order verification in a complete logic

I'd like to design a language that is able to reason over itseslf, means, able to get as input a code in that language (that might have went through some external redundant preprocessing, or "reflection" if to use another term) and reason over it. MLTT is of course a natural choice. But I'm seeking for a logically complete language, obviously sacrificing expressiveness (in fact it must not be able to express arithmetic, otherwise by Godel or more directly by the mortal matrix problem or Hilbert's 10th problem it cannot be complete). Therefore as first candidadte I thought of MSO over graphs. My question is how can MSO over graphs reason over itself, or more precisely, if we had a prolog-like (kb&query) language in MSO over graphs logic, how could it interpret or compile itself (and as a byproduct also reason over itseslf)?

by Troy McClure at August 21, 2016 11:33 PM

Can Tree Transducers Self-Interpret?

Is it possible for MSO graph/tree transducers to reflect themselves, namely to create an interpreter of tree/graph transducers using tree/graph transducers? If yes, I'll be happy for some design guidelines.

by Troy McClure at August 21, 2016 11:24 PM


What is Toxic FX Flow debate?

So, basically I want to debate and find out the real reason behind being flag by ECNs and venues as "toxic". How to avoid being flagged? What kind of strategies are toxic and why?

Below is an article found by a brokerage firm... so the opinion in that article couldn't be that objective.

Article about toxix FX flow

Any thoughts?

by Ariel Silahian at August 21, 2016 11:15 PM


Scikit-Learn: Std.Error, p-Value from LinearRegression

I've been trying to get the standard error & p-Values by using LR from scikit-learn. But no success.

I've end up finding up this article: but the std error & p-value does not match that from the statsmodel.api OLS method

import numpy as np 
from sklearn import datasets
from sklearn import linear_model
import regressor
import statsmodels.api as sm 

boston = datasets.load_boston()
which_betas = np.ones(13, dtype=bool)
which_betas[3] = False
X =[:,which_betas]
y =

#scikit + regressor stats
ols = linear_model.LinearRegression(),y)

xlables = boston.feature_names[which_betas]
regressor.summary(ols, X, y, xlables)

# statsmodel
x2 = sm.add_constant(X)
models = sm.OLS(y,x2)
result =
print result.summary()

Output as follows:

Min      1Q  Median      3Q      Max
-26.3743 -1.9207  0.6648  2.8112  13.3794

             Estimate  Std. Error  t value   p value
_intercept  36.925033    4.915647   7.5117  0.000000
CRIM        -0.112227    0.031583  -3.5534  0.000416
ZN           0.047025    0.010705   4.3927  0.000014
INDUS        0.040644    0.055844   0.7278  0.467065
NOX        -17.396989    3.591927  -4.8434  0.000002
RM           3.845179    0.272990  14.0854  0.000000
AGE          0.002847    0.009629   0.2957  0.767610
DIS         -1.485557    0.180530  -8.2289  0.000000
RAD          0.327895    0.061569   5.3257  0.000000
TAX         -0.013751    0.001055 -13.0395  0.000000
PTRATIO     -0.991733    0.088994 -11.1438  0.000000
B            0.009827    0.001126   8.7256  0.000000
LSTAT       -0.534914    0.042128 -12.6973  0.000000
R-squared:  0.73547,    Adjusted R-squared:  0.72904
F-statistic: 114.23 on 12 features
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.735
Model:                            OLS   Adj. R-squared:                  0.729
Method:                 Least Squares   F-statistic:                     114.2
Date:                Sun, 21 Aug 2016   Prob (F-statistic):          7.59e-134
Time:                        21:56:26   Log-Likelihood:                -1503.8
No. Observations:                 506   AIC:                             3034.
Df Residuals:                     493   BIC:                             3089.
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
const         36.9250      5.148      7.173      0.000        26.811    47.039
x1            -0.1122      0.033     -3.405      0.001        -0.177    -0.047
x2             0.0470      0.014      3.396      0.001         0.020     0.074
x3             0.0406      0.062      0.659      0.510        -0.081     0.162
x4           -17.3970      3.852     -4.516      0.000       -24.966    -9.828
x5             3.8452      0.421      9.123      0.000         3.017     4.673
x6             0.0028      0.013      0.214      0.831        -0.023     0.029
x7            -1.4856      0.201     -7.383      0.000        -1.881    -1.090
x8             0.3279      0.067      4.928      0.000         0.197     0.459
x9            -0.0138      0.004     -3.651      0.000        -0.021    -0.006
x10           -0.9917      0.131     -7.547      0.000        -1.250    -0.734
x11            0.0098      0.003      3.635      0.000         0.005     0.015
x12           -0.5349      0.051    -10.479      0.000        -0.635    -0.435
Omnibus:                      190.837   Durbin-Watson:                   1.015
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              897.143
Skew:                           1.619   Prob(JB):                    1.54e-195
Kurtosis:                       8.663   Cond. No.                     1.51e+04

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.51e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

I've also found the following articles

Both the codes in the SO link doesn't compile

Here is my code & data that I'm working on - but not being able to find the std error & p-values

import pandas as pd
import statsmodels.api as sm
import numpy as np
import scipy
from sklearn.linear_model import LinearRegression
from sklearn import metrics 

def readFile(filename, sheetname):
    xlsx = pd.ExcelFile(filename)
    data = xlsx.parse(sheetname, skiprows=1)
    return data

def lr_statsmodel(X,y):
    X = sm.add_constant(X)
    model = sm.OLS(y,X)
    results =
    print (results.summary())

def lr_scikit(X,y,featureCols):
    model = LinearRegression()
    results =,y)

    predictions =  results.predict(X)

    print 'Coefficients'
    print 'Intercept\t' , results.intercept_
    df = pd.DataFrame(zip(featureCols, results.coef_))
    print df.to_string(index=False, header=False)

    # Query:: The numbers matches with Excel OLS but skeptical about relating score as rsquared
    rSquare = results.score(X,y)
    print '\nR-Square::', rSquare

    # This looks like a better option
    # source:
    r2 = metrics.r2_score(y,results.predict(X))
    print 'r2', r2

    # Query: No clue at all! 
    print 'Rsquared?!' , metrics.explained_variance_score(y, results.predict(X))
    # INFO:: All three of them are providing the same figures!     

    # Adj-Rsquare formula @
    # In ML, we don't use all of the data for training, and hence its highly unusual to find AdjRsquared. Thus the need for manual calculation
    N = X.shape[0]
    p = X.shape[1]
    adjRsquare = 1 - ((1 -  rSquare ) * (N - 1) / (N - p - 1))
    print "Adjusted R-Square::", adjRsquare

    # calculate standard errors
    # mean_absolute_error
    # mean_squared_error
    # median_absolute_error 
    # r2_score
    # explained_variance_score
    mse = metrics.mean_squared_error(y,results.predict(X))
    print mse
    print 'Residual Standard Error:', np.sqrt(mse)

    # OLS in Matrix :
    n = X.shape[0]
    X1 = np.hstack((np.ones((n, 1)), np.matrix(X)))    
    se_matrix = scipy.linalg.sqrtm(
        metrics.mean_squared_error(y, results.predict(X)) *
        np.linalg.inv(X1.T * X1)
    print 'se',np.diagonal(se_matrix)


    y_hat = results.predict(X)
    sse = np.sum((y_hat - y) ** 2)
    print 'Standard Square Error of the Model:', sse

if __name__ == '__main__':

    # read file 
    fileData = readFile('Linear_regression.xlsx','Input Data')

    # list of independent variables 
    feature_cols = ['Price per week','Population of city','Monthly income of riders','Average parking rates per month']

    # build dependent & independent data set 
    X = fileData[feature_cols]
    y = fileData['Number of weekly riders']

    # Statsmodel - OLS 
#    lr_statsmodel(X,y)

    # ScikitLearn - OLS 

My data-set

Y   X1  X2  X3  X4
City    Number of weekly riders Price per week  Population of city  Monthly income of riders    Average parking rates per month
1   1,92,000    $15     18,00,000   $5,800  $50
2   1,90,400    $15     17,90,000   $6,200  $50
3   1,91,200    $15     17,80,000   $6,400  $60
4   1,77,600    $25     17,78,000   $6,500  $60
5   1,76,800    $25     17,50,000   $6,550  $60
6   1,78,400    $25     17,40,000   $6,580  $70
7   1,80,800    $25     17,25,000   $8,200  $75
8   1,75,200    $30     17,25,000   $8,600  $75
9   1,74,400    $30     17,20,000   $8,800  $75
10  1,73,920    $30     17,05,000   $9,200  $80
11  1,72,800    $30     17,10,000   $9,630  $80
12  1,63,200    $40     17,00,000   $10,570 $80
13  1,61,600    $40     16,95,000   $11,330 $85
14  1,61,600    $40     16,95,000   $11,600 $100
15  1,60,800    $40     16,90,000   $11,800 $105
16  1,59,200    $40     16,30,000   $11,830 $105
17  1,48,800    $65     16,40,000   $12,650 $105
18  1,15,696    $102    16,35,000   $13,000 $110
19  1,47,200    $75     16,30,000   $13,224 $125
20  1,50,400    $75     16,20,000   $13,766 $130
21  1,52,000    $75     16,15,000   $14,010 $150
22  1,36,000    $80     16,05,000   $14,468 $155
23  1,26,240    $86     15,90,000   $15,000 $165
24  1,23,888    $98     15,95,000   $15,200 $175
25  1,26,080    $87     15,90,000   $15,600 $175
26  1,51,680    $77     16,00,000   $16,000 $190
27  1,52,800    $63     16,10,000   $16,200 $200

I've exhausted all my options and whatever I could make sense of. So any guidance on how to compute std error & p-values that is the same as per the statsmodel.api is appreciated.

EDIT: I'm trying to find the std error & p-values for intercept and all the independent variables

by user6083088 at August 21, 2016 10:20 PM


Sum of size of distinct set of descendants $d$ distance from a node $u$, over all $u$ and $d$ is $\mathcal{O}(n\sqrt{n})$

Let's consider a rooted tree $T$ of $n$ nodes. For any node $u$ of the tree, define $L(u,d)$ to be the list of descendants of $u$ that are distance $d$ away from $u$. Let $|L(u,d)|$ denote the number of nodes that are present in the list $L(u,d)$.

Prove that the sum of $|L(u,d)|$ over all distinct lists $L(u,d)$ is bounded by $\mathcal{O}(n\sqrt{n})$.

My work

Consider all $L(u,d)$ such that the left most node on the level $Level(u) + d$ is some node $v$. The pairs $u, d$ for all such $L(u,d)$ must be distinct and the sum of all $d_i$ will correspond to the number of nodes $x$ in the tree with $Level(x) \le Level(u) + d$.

This is because if some sequence of nodes $v_1, v_2, \dots v_k$ corresponds to the descendants of some node $u$ at a distance $d$ and the sequence of nodes $v_1, v_2, \dots v_{k'}$ where $k' > k$ corresponds to the descendants of some node $u'$ at a distance $d+1$, then there must also exist a node $u''$ such that $L(u'', d) = v_{k+1}, v_{k+2}, \dots v_{k'}$. This would also mean that $u''$ is not in the subtree of $u$ and thus there are at least $d$ distinct nodes in the subtree of $u''$ upto a distance $d$ from $u''$.

If the distinct distances are $d_1, d_2, \dots d_k$ then, $n \ge \sum_{i}d_i \ge \sum_{i=1}^{k}i \ge \frac{k(k+1)}{2}$. =

$\implies k \le \sqrt{n}$

After this I tried to show that there can be only $\mathcal{O}(\sqrt{n})$ distinct lists $L(u,d)$ so that I can then trivially obtain the upper-boud of $n\sqrt{n}$ but I could not make any more useful observations.

This link claims that such an upper bound does exist but has not provided the proof.

Any ideas how we might proceed to prove this?

by Banach Tarski at August 21, 2016 10:13 PM



These patterns look exhaustive to me, "Non-exhaustive patterns error" ? Why?


While taking notes from a Haskell book, this code example should return: Left [NameEmpty, AgeTooLow], but it only returns the first case Left [NameEmpty]. Then when I pass the function mkPerson2 results to which it should return Right Person _ _, I get back a `Non-exhaustive pattern``` error. I've looked over this code for quite some time, but it looks right to me. What am I missing here? Any explanation on the subject would be absolutely appreciated, thanks!

Book I'm using


module EqCaseGuard where

type Name = String
type Age  = Integer
type ValidatePerson a = Either [PersonInvalid] a

data Person = Person Name Age deriving Show

data PersonInvalid = NameEmpty
                   | AgeTooLow
                   deriving Eq

ageOkay :: Age -> Either [PersonInvalid] Age
ageOkay age = case age >= 0 of
True  -> Right age
False -> Left [AgeTooLow]

nameOkay :: Name -> Either [PersonInvalid] Name
nameOkay name = case name /= "" of
True  -> Right name
False -> Left [NameEmpty]

mkPerson2 :: Name -> Age -> ValidatePerson Person
mkPerson2 name age = mkPerson2' (nameOkay name) (ageOkay age)

mkPerson2' :: ValidatePerson Name -> ValidatePerson Age -> ValidatePerson Person
mKPerson2' (Right nameOk) (Right ageOk) = Right (Person nameOk ageOk)
mKPerson2' (Left badName) (Left badAge) = Left (badName ++ badAge)
mkPerson2' (Left badName)  _            = Left badName
mkPerson2' _              (Left badAge) = Left badAge


*EqCaseGuard> mkPerson2 "jack" 22
*** Exception: eqCaseGuard.hs:(54,1)-(55,53): Non-exhaustive patterns in function mkPerson2'

*EqCaseGuard> mkPerson2 "" (-1)
Left [NameEmpty]

by Jonathan Portorreal at August 21, 2016 10:11 PM


Eine Sache, die ich in den USA immer total geil finde, ...

Eine Sache, die ich in den USA immer total geil finde, ist was für krasse euphemistische Tarnnamen sich die Lobbygruppen da immer geben. Die krasse Lobbygruppe für Deregulierung heißt zum Beispiel "Americans for Prosperity", als ob Deregulierung zu Wohlstand führen würde (außer für die eh schon Superreichen, natürlich). Die Schusswaffen-Regierungs-Lobby nennt sich sowas wie "Americans for Responsible Solutions" oder "Independence USA" (lolwut?), die Mehr-Schusswaffen-für-Alle-Lobby nennt sich "Institute for Legislative Action" (tut natürlich das Gegenteil davon, behindert Gesetzgebung zur Schusswaffenkontrolle). Eine der bekannteren Lobbygruppen gegen Regulierung der Fast Food-, Fleisch-, Alkohol- und Tabakindustrie heißt "Center for Consumer Freedom".

Und jetzt … finde ich heraus, dass es in Deutschland einen "Verein zur Erhaltung der Rechtsstaatlichkeit und bürgerlichen Freiheiten" gibt!

Nein, wirklich!

Und was macht der so?

Kommt ihr NIE drauf!

Klebt Pro-AfD-Werbeplakate!

August 21, 2016 10:00 PM


interactive brokers market order slippage

I have been considering to use the interactive brokers API for my automated trading platform, however I would like to see if anybody here has experience with the quality of the service.

My main concern is how much slippage is on average present with market orders.

I understand that the answer depends strongly on the type of future being traded, so assuming that only highly liquid large cap companies such as ones which are listed on the DOW or SNP.

since I am still developing and testing my quantitative strategy, I would just like to be aware of what kind of typical slippage I should account for in my testing models.

Edit: also if someone can recommend a service they feel is better then Interactive Brokers please let me know ( I'm located in Canada)

by abcla at August 21, 2016 09:48 PM


RXJava with Retrofit2 I can't retrieve the server response, nor a simple Log.d

I see from the debug that the array is retrieved, but if I put in onNext a Log.d with the first position of the array, I do not retrieve the Log in the console, and I do not know if I am calling the first position of the array in a correct way

Log.d("IVO", "onNext" + stackOverflowQuestions.items.get(0).title.toString());

this is the Main


import (..)

public class MainActivity extends ListActivity {

protected void onCreate(Bundle savedInstanceState) {

    ArrayAdapter<Question> arrayAdapter =
            new ArrayAdapter<Question>(this,
                    new ArrayList<Question>());

public boolean onCreateOptionsMenu(Menu menu) {
    getMenuInflater().inflate(, menu);
    return true;

public boolean onOptionsItemSelected(MenuItem item) {
    Gson gson = new GsonBuilder()
    Retrofit retrofit = new Retrofit.Builder()

    // prepare call in Retrofit 2.0
    StackOverflowAPI stackOverflowAPI = retrofit.create(StackOverflowAPI.class);

    //the real call to the server
    //Call<StackOverflowQuestions> call = stackOverflowAPI.loadQuestions("android");

    Observable<StackOverflowQuestions> observable = stackOverflowAPI.loadQuestions("android");

            .subscribe(new Subscriber<StackOverflowQuestions>() {
                public void onCompleted() {
                    Log.d("IVO", "completed");

                public void onError(Throwable e) {


                public void onNext(StackOverflowQuestions stackOverflowQuestions) {

                    Log.d("IVO", "onNext" + stackOverflowQuestions.items.get(0).title.toString());
                    Log.d("IVO", "onNext" );
//                        ArrayAdapter<Question> adapter = (ArrayAdapter<Question>) getListAdapter();
//                        adapter.clear();
//                        adapter.addAll(response.body().items);

    return true;


this is StackOverflowQuestions


import java.util.List;

public class StackOverflowQuestions {
    List<Question> items;

this is Question


// This is used to map the JSON keys to the object by GSON
public class Question {

    String title;
    String link;

    public String toString() {

EDIT StackOverflowAPI as requested:


import android.util.Log;

import retrofit2.Callback;
import retrofit2.http.GET;
import retrofit2.http.Query;
import retrofit2.Call;
import rx.Observable;

public interface StackOverflowAPI {
    //Call<StackOverflowQuestions> loadQuestions(@Query("tagged") String tags);
    Observable<StackOverflowQuestions>  loadQuestions(@Query("tagged") String tags);


by trocchietto at August 21, 2016 09:10 PM

Wes Felter

Overcoming Bias

Talks Not About Info

You can often learn about your own world by first understanding some other world, and then asking if your world is more like that other world than you had realized. For example, I just attended WorldCon, the top annual science fiction convention, and patterns that I saw there more clearly also seem echoed in wider worlds.

At WorldCon, most of the speakers are science fiction authors, and the modal emotional tone of the audience is one of reverence. Attendees love science fiction, revere its authors, and seek excuses to rub elbows with them. But instead of just having social mixers, authors give speeches and sit on panels where they opine on many topics. When they opine on how to write science fiction, they are of course experts, but in fact they mostly prefer to opine on other topics. By presenting themselves as experts on a great many future, technical, cultural, and social topics, they help preserve the illusion that readers aren’t just reading science fiction for fun; they are also part of important larger conversations.

When science fiction books overlap with topics in space, physics, medicine, biology, or computer science, their authors often read up on those topics, and so can be substantially more informed than typical audience members. And on such topics actual experts will often be included on the agenda. Audiences may even be asked if any of them happen to have expertise on a such a topic.

But the more that a topic leans social, and has moral or political associations, the less inclined authors are to read expert literatures on that topic, and the more they tend to just wing it and think for themselves, often on their feet. They less often add experts to the panel or seek experts in the audience. And relatively neutral analysis tends to be displaced by position taking – they find excuses to signal their social and political affiliations.

The general pattern here is: an audience has big reasons to affiliate with speakers, but prefers to pretend those speakers are experts on something, and they are just listening to learn about that thing. This is especially true on social topics. The illusion is exposed by facts like speakers not being chosen for knowing the most about a subject discussed, and those speakers not doing much homework. But enough audience members are ignorant of these facts to provide a sufficient fig leaf of cover to the others.

This same general pattern repeats all through the world of conferences and speeches. We tend to listen to talks and panels full of not just authors, but also generals, judges, politicians, CEOs, rich folks, athletes, and actors. Even when those are not the best informed, or even the most entertaining, speakers on a topic. And academic outlets tend to publish articles and books more for being impressive than for being informative. However, enough people are ignorant of these facts to let audiences pretend that they mainly listen to learn and get information, rather than to affiliate with the statusful.

Added 22Aug: We feel more strongly connected to people when we together visibly affirm our shared norms/values/morals. Which explains why speakers look for excuses to take positions.

by Robin Hanson at August 21, 2016 08:45 PM

Planet Theory

TR16-131 | Threshold Secret Sharing Requires a Linear Size Alphabet | Andrej Bogdanov, Siyao Guo, Ilan Komargodski

We prove that for every $n$ and $1 < t < n$ any $t$-out-of-$n$ threshold secret sharing scheme for one-bit secrets requires share size $\log(t + 1)$. Our bound is tight when $t = n - 1$ and $n$ is a prime power. In 1990 Kilian and Nisan proved the incomparable bound $\log(n - t + 2)$. Taken together, the two bounds imply that the share size of Shamir's secret sharing scheme (Comm. ACM '79) is optimal up to an additive constant even for one-bit secrets for the whole range of parameters $1 < t < n$. More generally, we show that for all $1 < s < r < n$, any ramp secret sharing scheme with secrecy threshold $s$ and reconstruction threshold $r$ requires share size $\log((r + 1)/(r - s))$. As part of our analysis we formulate a simple game-theoretic relaxation of secret sharing for arbitrary access structures. We prove the optimality of our analysis for threshold secret sharing with respect to this method and point out a general limitation.

August 21, 2016 08:44 PM


CSE graduation project

have to work on cloud computing for my graduation project. It will be a cloud-based application and not a research. I've read about combining embedded systems with clouds but I can't think of a cloud-based embedded application that can benefit of clouds. Is there any good ideas for such a project? or any other cloud-based application?

by A.M.H 12 at August 21, 2016 08:16 PM

Planet Emacsen

Grant Rettke: Emacs-wgrep Provides Writable Grep Buffers That Apply The Changes To The Files

Emacs-wgrep provides writable grep buffer that apply the changes to files.

Intuitive and familiar idea if you already like editable dired buffers.

by Grant at August 21, 2016 08:14 PM


What is the definition of a pattern in Rust and what is pattern matching? [on hold]

I am a programmer who is very familiar with languages like C and C++, but I have very little experience with things that are functional in nature. I am attempting to learn Rust and would like to know what Rust defines a pattern as, and what pattern matching with a match expression is in Rust.

by TheRenegade at August 21, 2016 08:10 PM

Planet Emacsen

Grant Rettke: It Is Time To Migrate from grep to ag

ag is fast, does what you expect, and works well with Emacs. Maybe it is time for you to switch.

by Grant at August 21, 2016 07:45 PM


Creating a Beta-Neutral Portfolio

Given a portfolio of assets (say 10) and trading signal (1=Hold):

      ___________________   Day Count  ______________________

  Asset |0|1|2|3|4|5|6|7|8|9|10|11| ... |30|31|32|33|34|35| ...
1. IBM  |1|1|1|1|1|1|1|0|0|0| 0| 0| ... | 0|-1|-1|-1| 0| 0| ... 
2. APPL |0|0|0|1|1|1|1|1|0|0|-1|-1| ... |-1| 0| 0| 0| 0| 0| ...
    :                        :                 :   
    :                        :                 :
10.TSLA |0|0|0|0|0|1|1|1|0|0| 0| 0| ... | 0|-1|-1|-1| 0| 0| ...

The trading signal can be read as follows:

  1. IBM : Buy on Day0 and Sell on Day7; then Short on Day31 and Buy back on Day34, and so on.
  2. APPL: Buy on Day3 and Sell on Day8; then Short on Day10 and Buy back on Day31, and so on
  3. TSLA: Buy on Day5 and Sell on Day8; then Short on Day31 and Buy back on Day34, and so on.

My question is that, given that the rebalancing time is not fixed and that on some days there are Long only or Short only positions, how can one make this portfolio Beta-Neutral?

by labrynth at August 21, 2016 07:43 PM


Is it inefficient to use Unity to turn 32kb of Javascript into a mobile app? Are there alternatives?

Q: What is Unity actually doing? Is it simply wrapping a shell around the code, or compiling it into lower level code?

Q: It possible to just make a simple app that is merely a browser, and run the js game code in that?

Part of the reason I ask is that my game code is <32kb and all I need is a menu and a way to connect to a server for PvP.

I tried a Unity built Minesweeper clone, which is a similarly simple, grid based puzzle that actually has fewer in-game elements, and much less complexity. It took forever to load. The project has ~50x more objects than I have functions. The project files take up 10,000x the space.

Unity seems more like a graphics engine and my code expresses a simple, abstract, non-trivial combinatorial game.

by cybermike at August 21, 2016 07:40 PM


An example on how to perform nearest neighbor search after dimensionality reduction [on hold]

I am following the Matlab's implementation of probabilistic principal component analysis (PPCA) for dimensionality reduction. It seeks to relate a p-dimensional observation vector y to a corresponding k-dimensional vector of latent (or unobserved) variable x, which is normal with mean zero and covariance. The relationship is


where y[n] is the row vector of the n th observed sample of dimension p; x is the latent /unobserved variable of dimension k (p>>k). It is not clear to me how to perform nearest neighbor search after doing dimensionality reduction using PPCA. It would be of immense help if an example is provided for any data set.

by SKM at August 21, 2016 07:39 PM

Unit testing function that calls other function

Say I have the following two functions:

add_five (number) -> number + 2

add_six (number) -> add_five(number) + 1

As you can see, add_five has a bug.

If I now test add_six it would fail because the result is incorrect, but the code is correct.

Imagine you have a large tree of functions calling each other, it would be hard to find out which function contains the bug, because all the functions will fail (and not only the one with the bug).

So my question is: should unit tests fail because of incorrect behaviour (wrong results) or because of incorrect code (bugs).

by Jan-Paul Kleemans at August 21, 2016 06:44 PM


Scheduling a sequence of queue operations that push and pop items at specified times

What is the time complexity of the following problem?


A FIFO is a queue functional unit supporting four commands: PUSH (data to back of queue), POP (the head of the queue), PNP (POP the head of queue and PUSH it to the back), NOP (do nothing). Each command takes one unit of time to execute.

FIFO code (or a schedule of commands) is a sequence of commands to execute.

Problem Description

We are given $n$ items of data $T_1,\dots,T_n$, and $n$ triplets $(T_1,t^{in}_1,t^{out}_1),\dots,(T_n,t^{in}_n,t^{out}_n)$. $t^{in}_i$ and $t^{out}_i$ identify the time when $T_i$ is PUSHed and POPed respectively. We're guaranteed that $t^{in}_i<t^{out}_i$ for every $i$ and $t^{in}_i,t^{out}_i$ are all unique.

The goal is to produce FIFO code (a schedule of commands) that push each $T_i$ at time $t^{in}_i$ and pop it at time $t^{out}_i$, by adding NOP and PNP commands between the PUSH and POP commands given. No extra PUSH or POP commands can be added: the resulting code must contain exactly $n$ PUSHes and $n$ POPs.


Input: $(T_1,2,4)$, $(T_2,1,5)$


  1. PUSH $T_2$
  2. PUSH $T_1$
  3. PNP
  4. POP // T_1
  5. POP // T_2

by Daugmented at August 21, 2016 06:36 PM


Portfolio risk analysis in Options & Mixed portfolios

I am currently working on a risk analysis model that is primarily focused on options portfolios, but will likely be later expanded to cover mixed (options, stocks, bond, futures, etc...) portfolios. This will be used at a non-professional but advanced level to identify overweighted risks and show how proposed positions would affect the portfolio risk balance.

The goal is to be able to clearly show risks in a number of scenarios; Market move up/down, Correction down(w/ IV shock), Individual symbol shocks, etc

I want to be able to show the effect of risks to Portfolio performance and also to the greeks and the resulting risk profile.

The basic portfolio analysis methods such as beta weighting and VaR models seem to be very limited and don't have any concept of IV change or the effects of volatility shocks. I could mix some different models, but I still need the basic underlying models to do that.

Could anyone offer some suggestions for a risk modeling framework or even specific analysis techniques that could be used in simulations to get the results I need? At this point, I am searching but finding little that directly applies. Some guidance would be very welcome.

by drobertson at August 21, 2016 06:20 PM




Puppet Lint Plugins – 2.0 Upgrade and new repo

After the recent puppet-lint 2.0 release and the success of our puppet-lint 2.0 upgrade at work it felt like the right moment to claw some time back and update my own (11!) puppet-lint plugins to allow them to run on either puppet-lint 1 or 2. I’ve now completed this and pushed new versions of the gems to rubygems so if you’ve been waiting for version 2 compatible gems please feel free to test away.

Now I’ve realised exactly how many plugins I’ve ended up with I’ve created a new GitHub repo, unixdaemon-puppet-lint-plugins, that will serve as a nicer discovery point to all of my plugins and a basic introduction to what they do. It’s quite bare bones at the moment but it does present a nicer approach than clicking around my github profile looking for matching repo names.

by Dean Wilson at August 21, 2016 05:47 PM

Planet Emacsen

Irreal: Capturing BibTeX Entries with Google Scholar

Brad Collins has a nice post on collecting BibTeX citations. As he notes, there are plenty of articles on how to generate a citation in Org mode from a BibTeX entry but not on how to gather the entries to begin with.

He starts with a simple Org mode template to capture the citation once you retrieve it from Google Scholar. The idea is that you copy it from Google Scholar and paste it into the capture buffer. If you do this a lot, it would be pretty easy to write a bit of Elisp to automatically copy the citation, bring up the capture buffer, and paste the entry into it.

If you're using Firefox or Chrome you can make things easier by installing the Google Scholar button and then follow Collins' workflow. If you're on a Mac using Safari—or, I suppose, on Windows using one of the Microsoft browsers—his basic workflow still works. Just follow these steps:

  1. Go to the Google Scholar page
  2. Search for the paper you're interested in
  3. Click the “Cite” link at the bottom of the article description
  4. Choose BibTeX in the popup
  5. A tab will open with the plain text citation in BibTeX format
  6. Copy and paste the citation as described in Collins' post

If you're writing a lot of papers for school or for work, Collins' method is an easy way to build up your bibliography database. Even if gathering a citation is an occasional thing, knowing how to use Google Scholar to retrieve it is useful.

by jcs at August 21, 2016 05:34 PM


offset randomforestclassifier scikit learn

I wrote a program in python to use a machine learning algorithm to make predictions on data. I use the function RandomForestClassifier from Scikit Learn to create a random forest to make predictions.

The purpose of the program is to predict if an unknown astrophysical source is a pulsar or an agn; so it trains the forest on known data of which it knows if sources are pulsar or agn, then it makes predictions on unknown data, but it doesn’t work. The program predict that unknown data are all pulsar or all agn and it rarely predicts a different result, but not correct.

Below I describe the passages of my program.

It creates a data frame with data for all the sources: all_df It is made of ten columns, nine used as predictors and one as target:


type column contains the label “pulsar” or “agn” for each source.

The values of predictors and targets are used successively in the program to train the forest.

The program divides the predictors and the targets in two sets, the train, which is the 70% of the total, and the test, which is the 30% of the total of all_df, using the function train_test_split from Scikit Learn:

pred_train, pred_test, tar_train, tar_test=train_test_split(predictors, targets, test_size=0.3)

Data in these sets are mixed, so the program orders the indexes of these sets, without changing data position:


After that, the program creates and trains the random forest:


Now the program makes prediction on the test set:


At this point, the program seems to work.

Now it pass another data frame, with the unknown data, to the forest created above and I have the bad result described before. Can you help me? The problem could be an offset in randomforestclassifier, but I had no significative results modifying randomforestclassifier options. If you need, I can give further explanations. Thanks in advance.

Bye, Fabio

PS: I tried the cross validation too: I divided the train set into train and test again, with the same proportions (0.7 and 0.3), to create, train and test the forest before testing it on the initial test set, modifying randomforestclassifier options to obtain better results, but I had no improvements.

by user6728339 at August 21, 2016 05:29 PM


Is my implementation of a Disjoint Set fast?

When I'm reading about new data structures I try to just read a little bit about it, and then implement it. Then I read how to actually implement it, which I think gives me better understanding.

Below is my implementation of a disjoint set (I think), and also some testing.

from random import shuffle

class DisjointSets(object):
  def __init__(self, size):
    self.set = range(size)

  def find(self, i):
    if self.set[i] != self.set[self.set[i]]:
      self.set[i] = self.find(self.set[i])
    return self.set[i]

  def union(self, i, j):
    self.set[self.find(i)] = self.find(j)

# Create disjoint sets
N = 2 ** 20
s = DisjointSets(N)

# Join all sets
scrambled = range(1, N)

for i in scrambled:
  s.union(i, i - 1)

# Assert we have one big set
for i in range(N):
  assert s.find(i) == s.find(0)

After benchmarking union and find seem to be O(1), at least amortized. (I set N to different values between 2^15 and 2^20.) The thing is that I am not using any rank, which I thought was required to get O(1) performance. (Ignoring inverse Ackermann)

Any input?

by Marcus Johansson at August 21, 2016 05:24 PM


Machine Learning and Kalman filtering

I have a dataset with a large number of features and some thousands of observations. However, a number of observations are still missing and I want to estimate those missing observations by exploiting the pattern or periodicity of the available data. My questions are

  1. Is it possible to apply Kalman filtering to estimate those missing values?
  2. Is Kalman filtering used in anomaly detection problems?

Lastly, If I have unknown data, without any labelling, then how can I train the classifier in this case. My understanding is that for supervised machine learning an example data is available with labels already. From that example dataset, we train the classifier and use this in classifying unseen observations and outlier detection. I have one dataset that doesn't have outliers but there is no labelling in it. And, there is another dataset with same features but it contains outliers in it. Also, it contains much larger number of an observation than the uncorrupted data set. How should I tackle outlier detection problem in this case?

by Umar at August 21, 2016 05:14 PM



How to display the premise and consequence when the setCar is set to true?

I want to get the premise and consequence for each line of generated rules after the running the apriori algorithm in Weka 3.8.0.




I tried the code below to get the rules but it gives me an exception (weka.associations.ItemSet cannot be cast to weka.associations.AprioriItemSet):

        AssociationRules arules = apriori.getAssociationRules();

Also, I tried using the getAllTheRules() method but it gives me a different result.

    ArrayList<Object>[] arules = apriori.getAllTheRules();
    System.out.println(((ItemSet)arules[0].get(1)).getRevision()); //12014
    System.out.println(((ItemSet)arules[0].get(2)).getRevision()); //12014
    System.out.println(((ItemSet)arules[0].get(5)).getRevision()); //12014

by Hoo at August 21, 2016 04:30 PM



computer networks and graph theory [on hold]

What is the fastest way to find the distance between 2 nodes , if the intermediate node change during runtime?

by Vibha at August 21, 2016 03:31 PM



Getting only NaN when using Tensorflow's C++ API

So I have this function in my code:

tensorflow::Tensor predictV1(tensorflow::Session* sess, tensorflow::Tensor X)
  assert(X.dim_size(1) == 480);
  assert(X.dim_size(2) == 640);
  assert(X.dim_size(3) == 3);
  tensorflow::Tensor keep_prob(tensorflow::DT_FLOAT, tensorflow::TensorShape());
  keep_prob.scalar<float>()() = 1.0;

  std::cout<<"Created keep_prob"<<std::endl;
  std::vector<std::pair<std::string, tensorflow::Tensor>> inputs = {
    { "x", X },
    { "keep_prob", keep_prob},
  std::cout<<"Created the input vector"<<std::endl;
  std::vector<tensorflow::Tensor> outputs;
  // Run the session, evaluating our "y" operation from the graph
  tensorflow::Status status = sess->Run(inputs, {"y"}, {"y"}, &outputs);
  if (!status.ok()) {
    std::cout << status.ToString() << "\n";
    return X;
  tensorflow::Tensor Y = outputs[0];
  auto T_M = Y.tensor<float,4>();
  return Y;

Now sess in created by LoadGraph function in the TF's C++ documentation here. So I create a frozen graph (using freeze_graph script in TF). I feed it LoadGraph which in turn I feed to this functions.

When I run the program, all values of Y is -nan. So I went back to test it in python, and it seems like there isn't this problem over there. (Note: the graph is created by this script)

So where does the error come from?

EDIT: I forgot the cnn_functions

by Alperen AYDIN at August 21, 2016 02:51 PM


SEC 10-Q/K Filings

I am working on some research that requires parsing of SEC 10 K/Q filings. We have built a parser that will parse the raw txt SEC filing that usually contains many blocks of unencoded files (html, xml, pdfs, images, spreadsheets, etc). A typical decoded 10 K/Q (as of CY 2014) has a set of files that looks like the following:

10 K/Q Payload

Does anyone have any documentation or guidance that explains what the R1.htm - RX.htm files are supposed to contain and more broadly any documentation that describes what is typically found in a decoded 10 K/Q? The SEC doesn't have any documentation at this level of granularity. (Reasons being this that this submission exemplified above maybe from that of a particular filing prep vendor / software, however, this format seems to be the most pervasive as of CY2014).

Thank you in advance for any guidance.

by Jon Firuz at August 21, 2016 02:18 PM


Functional Programming exercise with Scala

I have recently started reading the book Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason, as a means to learn FP. I want to learn it because it will open my head a bit, twist my way of thinking and also hopefully make me a better programmer overall, or so I hope.

In their book, Chp. 3, they define a basic singly-linked-list type as follows:

package fpinscala.datastructures

sealed trait List[+A]
case object Nil extends List[Nothing]
case class Cons[+A](head: A, tail: List[A]) extends List[A]

object List {
    def sum(ints: List[Int]): Int = ints match {
    case Nil => 0
    case Cons(x,xs) => x + sum(xs)
    def product(ds: List[Double]): Double = ds match {
    case Nil => 1.0
    case Cons(0.0, _) => 0.0
    case Cons(x,xs) => x * product(xs)
    def apply[A](as: A*): List[A] =
    if (as.isEmpty) Nil
    else Cons(as.head, apply(as.tail: _*))

I'm now working on implementing the tail method, which shall work similarly to the tail method defined in Scala libraries. I guess that the idea here, is to define a tail method inside the List object, what they call a companion method, and then call it normally in another file (like a Main file).

So far, I have this:

def tail[A](ls: List[A]): List[A] = ls match {
    case Nil => Nil
    case Cons(x,xs) => xs

Then I created a Main file in another folder:

package fpinscala.datastructures

object Main {
    def main(args:Array[String]):Unit = {
       println("Hello, Scala !! ")
       val example = Cons(1, Cons(2, Cons(3, Nil)))
    val example2 = List(1,2,3)
       val example3 = Nil
    val total = List.tail(example)
    val total2 = List.tail(example3)

This works and gives me:

Hello, Scala !! 

My question is:

Is this the correct way to write the tail method, possibly, as the authors intended? And is this package structure correct? Because it feels very wrong to me, although I just followed the authors package.

I also don't know if I should have used a specific type instead of writing a polymorphic method (is this the name?)...

Bear with me, for I am a newbie in the art of FP.

by Bruno Oliveira at August 21, 2016 02:02 PM


Why does one get a delta greater than 1 when using the likelihood estimator?

I'm using the likelihood estimator as derived in this pdf to calculate the delta of a european call option. However I consistently get a delta exceeding one. I'm taking 10,000 samples of Z for calculating the delta.

enter image description here


by Nisarg Kamdar at August 21, 2016 01:41 PM

Is there a specific meaning to the word "convoluted" in maths or mathematical finance?

I'm reading about copula estimation in the book Financial Modeling Under Non-Gaussian Distributions by Jondeau, Poon and Rockinger. They say that full maximum likelihood can be difficult because of i) dimensionality and because ii) "the copula parameter may be a "convoluted expression" of the margins parameter."

I've been looking for a translation of that word, but I only find difficult or complicated as synonyms of convoluted. I believe there is more to it than just "complicated". Is there maybe a more specific meaning for this word? Would you know what the authors mean with "convoluted expression" in this context?

by Kondo at August 21, 2016 01:35 PM