Planet Primates

September 05, 2015



Space requirements for solving True Quantified Boolean Formulas problem

I came across this section on the wikipedia page for the TQBF solving problem, and just can't wrap my head about the fact that the space requirement is linear. Moreover, it does not provide any reference.

For a given QBF $$Q_1x_1Q_2x_2...Q_nx_n\phi(x_1,x_2,...,x_n)$$

for each quantifier, the algorithm presented evaluates

$$ A = Q_2x_2...Q_nx_n\phi(0,x_2,...,x_n)$$ $$B = Q_2x_2...Q_nx_n\phi(1,x_2,...,x_n)$$

and returns $A\wedge B$ for $Q_1 = \forall$ and $A\lor B$ for $Q_1 = \exists$. The way I work it out the space should be $O(2^n)$ instead of $O(n + \log(n)) = O(n)$. When I solve a QBF on paper, I realize it is not exponential, so something must be terribly wrong with my analysis.

At each step, the new $A$ and $B$ have one quantifier less, but there are twice as many expressions. I assumed that would build up as $$ 1: n$$ $$ 2 : (n-1) + (n-1) $$ $$ 3 : (n-2) + (n-2) + (n-2) + (n-2) $$ $$ \ldots $$ $$ k : 2^k (n-k) $$

which goes all the way up to $2^{n-1}$ before decreasing to zero. Can anyone provide a pointer on what I am doing wrong, or even better a good reference?


by Nicolas Mattia at September 05, 2015 01:58 PM


In a DFA, does every state have a transition on every symbol of the alphabet?

If not, then what does it mean when for some state $q$ and some symbol $a$, $\delta(q, a)$ does not exist?

by drowse at September 05, 2015 01:56 PM


How to avoid the use of mutable datastructure and use more functional approach?

I'm building in scheme a database using a wiredtiger key/value store.

To query a given table one needs to have a cursor over the table. The library recommends to re-use the cursor. The general behavior can be described by the following pseudo-code:

with db.cursor() as cursor:




During the extent of the with statment cursor can only be used in the current context. If do_something(db) needs a cursor, it must create/retrieve another cursor even if it's to query the same table. Otherwise the cursor loose its position and the continuation of do_something(db) doesn't expect.

You can work around it by always reseting the cursor, that's a waste. Instead it's preferable to keep a set of cursors ready to be used and when one can request via db.cursor() this will remove a cursor from the available cursors and return it. Once the "context" operation is finished, put it back.

The way I solve this in Python is by using a list. db.cursor() looks like:

def cursor(self):
    cursor = self.cursors.pop()
    yield cursor

Which means, retrieve a cursor, send it to the current context, once the context is finished, put it back to the list of available cursors.

How can I avoid the mutation and use more functional approach?

by amirouche at September 05, 2015 01:47 PM



Is algebraic dependency decidable?

A set of numbers $S=\{x_1,...,x_n\}$ is said to be algebraically dependent if there exists a (multivariate) polynomial $p$ with coefficients in $\mathbb Q$ whose roots contain $x_1,...,x_n$ (or a subset thereof).

Given numbers $S=\{x_1,...,x_n\}$, is it decidable whether they are algebraically dependent?

Now, there is obviously an encoding issue here, since the numbers need to be transcendental, so describing them is problematic. Any solution would do, but specifically, I am interested in the case where all the numbers are of the form $e^{a_i}$, where $a_i$ is rational (so we can describe the numbers by $a_i$).

The motivation for this question is understanding whether a (positive) solution to Schanuel's conjecture would automatically lead to a constructive solution.

by Shaull at September 05, 2015 01:31 PM


Can an algorithm that yields $O(n^2)$ answers run in $O(n)$ time?

My question may actually be more broadly described as: can I use the fact that an algorithm is expected to return $(O(f(n))$ answers to show that it may never run better than $O(f(n))$? I would intuitively assume yes, but I am not sure.

Examples of what I'm talking about is algorithms to calculate all the paths that pass through a set of points. I can easily calculate a higher bound $O(f(n))$ on how many those paths are. Will that tell me the algorithm must be no better than $\Omega(f(n))$?


by devoured elysium at September 05, 2015 01:29 PM


Fred Wilson

Video Of The Week: A Discussion Of Accelerators

The economist Ian Hathaway did a public video hangout with Brad Feld this week and posted it on YouTube. They cover a lot of interesting territory in and around the startup accelerator phenomenon.

by Fred Wilson at September 05, 2015 12:46 PM


What is the status of intermediate problems if P is not NP in worst way imaginable?

Assume $P\neq BPP\neq NP$ with caveat that there is a deterministic algorithm for every $NP$ complete problem with input size $n$ bits in $2^{(\log n)^{1+f(n)}}$ arithmetic operations on $\log n$ sized words where $f(n)$ grows slower than any definable function.

Where does complexity of intermediate problems stand in this case and utility of randomness stand in this case?

What is scenario if $P= BPP\neq NP$ or $P\neq BPP=NP$ holds?

by Turbo at September 05, 2015 12:41 PM

DragonFly BSD Digest


Exercise 2.2 from the book "The concept and practice of Mathematical Finance"

I am a newbie. Please help me understand how to resolve the exercise 2.2 from the book "The concept and practice of Mathematical Finance". The solution from the book says that our super-replicating portfolio will be $\alpha$ shares and $\beta$ bonds. It must dominate at zero. This implies that $\beta$ >= 0. First of all, what does it mean "it must dominate at zero". Secondly, why if it dominates at zero, then $\beta$ >= 0? Thanks so much for your help!


enter image description here


enter image description here enter image description here

by dullboy at September 05, 2015 12:21 PM


How to configure which user fcgiwrap runs as on FreeBSD?

I have Redmine/Git/nginx/fcgiwrap running in a jail on FreeBSD 9.3 for (potentially) authenticated Git commits over HTTP/S. Everything works until I restart the jail.

In order for a commit to work I need to manually change /var/run/fcgiwrap/fcgiwrap.sock from srwxr-xr-x root:wheel to srwxrwxr-x root:www.

It seems like there should be a better way to do this so that its persistent over a reboot. My feeling is that there should be some way of telling fcgiwrap who to run as but I can't work out where this is specified on FreeBSD.

The man page says:

Most probably you will want to launch fcgiwrap by spawn-fcgi using a configuration like this:


Based on this question I have looked in /usr/local/etc/rc.d for spawn-fcgi but its not there which I assume means its not installed. It also seems overkill to install spawn-fcgi just to manage who fcgiwrap runs as.

I've found in /usr/local/etc/rc.d/fcgiwrap it says:

# fcgiwrap rc.d script supports multiple profiles (a-la rc.d/nginx)
# When profiles are specified, the non-profile specific parameters become defaults.
# You need to make sure that no two profiles have the same socket parameter.

What is a profile and how would I go about creating one for this rc.d script? Or am I going about this all the wrong way?

by Styne666 at September 05, 2015 12:09 PM


Why QuantLib computes the fixed-leg swap rate by this formula?

I'm trying to understand how QuantLib creates (bootstraps) a yield curve from a vanilla swap at the source level. I have the following test code:

void testYieldFromIRSwap()
Settings::instance().evaluationDate() = Date(1, Jan, 2015);

auto dc = Actual360();

auto h1 = boost::shared_ptr<RateHelper>(new DepositRateHelper
                                               (0.03, Period(1, Years), 0.0, UnitedStates(), Following, false, Actual360()));
auto h2 = boost::shared_ptr<RateHelper>(new DepositRateHelper
                                               (0.04, Period(2, Years), 0.0, UnitedStates(), Following, false, Actual360()));

auto index = boost::shared_ptr<IborIndex>(new EURLibor1Y());
auto h3 = boost::shared_ptr<RateHelper>(
                new SwapRateHelper(0.05, Period(3, Years), UnitedStates(), Annual,
                                   Following, Actual360(), index));

std::vector<boost::shared_ptr<RateHelper>> helpers;

boost::shared_ptr<YieldTermStructure> yield(
                new PiecewiseYieldCurve<Discount,LogLinear>(
                    Settings::instance().evaluationDate(), helpers,

const auto t1 = dc.yearFraction(Date(1, Jan, 2015), Date(1, Jan, 2016)); // 1.014
const auto t2 = dc.yearFraction(Date(1, Jan, 2015), Date(1, Jan, 2017)); // 2.031
const auto t3 = dc.yearFraction(Date(1, Jan, 2015), Date(1, Jan, 2018)); // 3.044

std::cout << yield->discount(0)  << std::endl; // Must be 1
std::cout << yield->discount(t1) << std::endl; // 1/((1+0.03) ^ 1.014) = 0.9704721
std::cout << yield->discount(t2) << std::endl; // 1/(1+0.04) ^ 2.031   = 0.9234328
std::cout << yield->discount(t3) << std::endl;

This is obviously a toy example. I have two deposits that gives the discount rates for the first two years. My swap is an annual 3y swap, I expect the swap gives me the discount rate for the the third year. I do get the value but I don't understand how it's computed.

The code for bootstrapping a discount factor from a swap is:

Real SwapRateHelper::impliedQuote() const {
    QL_REQUIRE(termStructure_ != 0, "term structure not set");
    // we didn't register as observers - force calculation
    // weak implementation... to be improved
    static const Spread basisPoint = 1.0e-4;
    Real floatingLegNPV = swap_->floatingLegNPV();
    Spread spread = spread_.empty() ? 0.0 : spread_->value();
    Real spreadNPV = swap_->floatingLegBPS()/basisPoint*spread;
    Real totNPV = - (floatingLegNPV+spreadNPV);
    Real result = totNPV/(swap_->fixedLegBPS()/basisPoint);
    return result;

Everytime when QuantLib guesses a new discount factor (i.e. iteratively bootstrapping), this function computes a new NPV for the floating leg. I'd expect this NPV be set to equal to the NPV for the fixed leg, from which a new swap rate can be computed.

What I don't understand is this particular line:

Real result = totNPV/(swap_->fixedLegBPS()/basisPoint);

I think BPS is the sum of accurate interest and is computed by:

bps += cp->nominal() * cp->accrualPeriod() * df;


Why do we need to divide the new fixed-leg NPV by BPS and basisPoint? Why basisPoint is set to 1.0e-4 (1 basis point)? What's the point of doing it?

by Student T at September 05, 2015 12:08 PM


Expressing iterative version of fold in terms of recursive version

1st Definition. Recursive definition of fold

fold$_{recur}$ (c,h) nil = c

fold$_{recur}$ (c,h) (cons(a, x)) = h(a, fold$_{recur}$ (c,h) x)

2nd definition of fold. Iterative definition

fold$_{itr}$ (c, h) nil = c

fold$_{itr}$ (c, h) (cons (a, x) ) = fold$_{itr}$ (h (c, a), h) x

How do I express fold$_{itr}$ in terms of fold$_{recur}$.

by Abdul Rahman at September 05, 2015 11:40 AM


convert three months interbank rate into monthly rate

I have a time series of the three month interbank rate each month and I suppose that the rate is has yearly frequency.

I need these interbank rates to be on a monthly basis because I want to us these rate as a proxy for the risk- free rate, so I can subtract the rates from my monthly return to get the excess rate.

Let's consider an example:

by assuming the monthly return of stock x in April is 6%. The interbank rate at the end of April is 3% annually. Can I simply subtract o.25%(3/12) from 6% resulting in 5.75% of excess return.

Any help would be much appreciated.

by vomicha at September 05, 2015 11:40 AM


xah-fly-keys, an efficient keyset for emacs

xah-fly-keys is now on MELPA.

xah-fly-keys is an efficient keyset for emacs. It is a modal mode, like vi, but key choices are based on statistics of command call frequency, and key position easy-to-press score.

It does not bind any Control key, nor Meta keys (except 3, but you can turn off).

To learn xah-fly-keys, is like learning vi for the first time. All is new. You'll need one month to adopt. (but you can use emacs as is, because no Control or Meta are used. Just leave it in insertion mode.)

xah-fly-keys is currently optimized for Dvorak layout only. If you touch-type QWERTY or other, you will need to rebind keys.

Give it a shot. Bugs can be posted to github. Questions can be commented on the home page, or here. Thanks. Edit: corrected a word typo.

submitted by xah
[link] [3 comments]

September 05, 2015 11:28 AM


null safety

I've never tried Kotlin and I don't think I will because it doesn't have a type system interesting enough for me, but its idea of null safety might be worth exploring.

The only problem I can see is that the compiler must be smart enough to know when a pointer is not null and can be dereferenced. Safety requires that the compiler be conservative: "When in doubt, give a compilation error!".

Option[T] is an imperfect solution because it doesn't enforce safety. If one forgets to wrap a reference in Option or mistakenly assume that a function can't return null and therefore doesn't use Option, a NPE may be thrown at runtime.

Do you think it'd be useful to have something similar to Kotlin's null safety feature in Scala or not?

submitted by Kiuhnm
[link] [6 comments]

September 05, 2015 11:19 AM

Planet Theory

TR15-145 | Arithmetic circuit classes over Zm | Eric Allender, Asa Goodwillie

We continue the study of the complexity classes VP(Zm) and Lambda�P(Zm) which was initiated in [AGM15]. We distinguish between “strict” and “lax” versions of these classes and prove some new equalities and inclusions between these arithmetic circuit classes and various subclasses of ACC^1.

September 05, 2015 11:06 AM


Habt ihr das auch gehört? Wir nehme soooo viele Flüchtlinge ...

Habt ihr das auch gehört? Wir nehme soooo viele Flüchtlinge auf? Das überlastet unsere Infrastruktur und Ressourcen total krass? Ja?

Nun, dann Guckt mal nach Jordanien.

Anderthalb Millionen Flüchtlinge in einem Land mit rund sechs Millionen Einwohnern – für Deutschland würde das im Verhältnis die Aufnahme von zwanzig Millionen Flüchtlingen bedeuten. Und Jordanien ist ein ressourcenarmes Entwicklungsland; das Pro-Kopf-Einkommen beträgt nur ein Viertel des deutschen.

Dass es hier nicht zu Ausschreitungen oder wenigstens Protesten kommt, grenzt an ein Wunder.

September 05, 2015 11:00 AM


Functional programming and dependency inversion: how to abstract storage?

I'm trying to create a solution that has a lower-level library that will know that it needs to save and load data when certain commands are called, but the implementation of the save and load functions will be provided in a platform-specific project which references the lower-level library.

I have some models, such as:

type User = { UserID: UserID
              Situations: SituationID list }

type Situation = { SituationID: SituationID }

And what I want to do is be able to define and call functions such as:

do saveUser ()
let user = loadUser (UserID 57)

Is there any way to define this cleanly in the functional idiom, preferably while avoiding mutable state (which shouldn't be necessary anyway)?

One way to do it might look something like this:

type IStorage = {
    saveUser: User->unit;
    loadUser: UserID->User }

module Storage =
    // initialize save/load functions to "not yet implemented"
    let mutable storage = {
        saveUser = failwith "nyi";
        loadUser = failwith "nyi" }

// ....elsewhere:
do = { a real implementation of IStorage }
do ()
let user = (UserID 57)

And there are variations on this, but all the ones I can think of involve some kind of uninitialized state. (In Xamarin, there's also DependencyService, but that is itself a dependency I would like to avoid.)

Is there any way to write code that calls a storage function, which hasn't been implemented yet, and then implement it, WITHOUT using mutable state?

(Note: this question is not about storage itself -- that's just the example I'm using. It's about how to inject functions without using unnecessary mutable state.)

by Overlord Zurg at September 05, 2015 10:43 AM


FreeBSD - No internet connection inside jail

I am currently trying to set up a server on FreeBSD. I want to create jails with ezjails according to the handbook. I am following the example to install BIND inside a jail but I am stuck at the installation step (make -C /usr/ports/dns/bind99 install clean).

At first I thought I had a DNS problem (bad /etc/resolv.conf) but it seems that I have simply no internet inside the jail.

On the host: ( is

root@varda:~ # nc -z -w 2 80
Connection to 80 port [tcp/http] succeeded!

Inside the jail:

root@dns:~ # nc -z -w 2 80; echo $?

Any idea what is going on?

root@varda:~ # ezjail-admin list
STA JID  IP              Hostname                       Root Directory
--- ---- --------------- ------------------------------ ------------------------
DR  2     www                            /home/jails/www
    2    lo1|
DR  3     dns                            /home/jails/dns
    3    lo2|

(www is another jail in which I have the same problem)

root@varda:~ # ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:22:4d:ad:be:2a
    inet ???.???.???.??? netmask 0xffffff00 broadcast ???.???.???.??? 
    inet6 fe80::222:4dff:fead:be2a%em0 prefixlen 64 scopeid 0x1 
    inet6 2001:41d0:a:f231::1 prefixlen 128 
    inet netmask 0xffffffff broadcast 
    inet netmask 0xffffffff broadcast 
media: Ethernet autoselect (100baseTX <full-duplex>)
    status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    inet6 ::1 prefixlen 128 
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
    inet netmask 0xff000000 
lo1: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    inet netmask 0xffffffff 
lo2: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    inet netmask 0xffffffff 

root@dns:~ # ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:22:4d:ad:be:2a
    inet netmask 0xffffffff broadcast 
    media: Ethernet autoselect (100baseTX <full-duplex>)
    status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
lo1: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
lo2: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    inet netmask 0xffffffff

by styko at September 05, 2015 10:37 AM


What is difference between function oriented and procedure oriented? [duplicate]

This question already has an answer here:

I am currently using functional programming(F#). I also know c, and it is procedure oriented language. I am getting confused with functional programming and procedure oriented programming. It seems like same execution except function and procdure.

F# treats every statement as a function.

Please clear functional and procedure oriented language.

by Ravindra Sinare at September 05, 2015 10:03 AM

What are the most efficient representations of numbers without native types?

On Haskell, most numeric types are natives - Int, Float, Word32 etc. There is also a popular representation of unary natural numbers with ADTs only - that is the Peano encoding:

data Nat = Succ Nat | Zero

That datatype, while elegant, is not very efficient. Multiplication, exponentiation, division with unary numbers are unpractical. My question is: if we didn't have native types to count on, what would be the most efficient representation of numbers - nats, ints, fracs, complex, etc. - in a pure functional language such as Haskell? What would the datatypes and the respective algorithms look like?

by Viclib at September 05, 2015 10:02 AM

How to compose two different `State Monad`?

When I learn State Monad, I'm not sure how to compose two functions with different State return types.

State Monad definition:

case class State[S, A](runState: S => (S, A)) {

  def flatMap[B](f: A => State[S, B]): State[S, B] = {
    State(s => {
      val (s1, a) = runState(s)
      val (s2, b) = f(a).runState(s1)
      (s2, b)

  def map[B](f: A => B): State[S, B] = {
    flatMap(a => {
      State(s => (s, f(a)))


Two different State types:

type AppendBang[A] = State[Int, A]

type AddOne[A] = State[String, A]

Two methods with differnt State return types:

def addOne(n: Int): AddOne[Int] = State(s => (s + ".", n + 1))

def appendBang(str: String): AppendBang[String] = State(s => (s + 1, str + " !!!"))

Define a function to use the two functions above:

def myAction(n: Int) = for {
  a <- addOne(n)
  b <- appendBang(a.toString)
} yield (a, b)

And I hope to use it like this:


The problem is myAction is not compilable, it reports some error like this:

Error:(14, 7) type mismatch;
 found   : state_monad.State[Int,(Int, String)]
 required: state_monad.State[String,?]
    b <- appendBang(a.toString)

How can I fix it? Do I have to define some Monad transformers?

Update: The question may be not clear, let me give an example

Say I want to define another function, which uses addOne and appendBang internally. Since they all need existing states, I have to pass some to it:

def myAction(n: Int)(addOneState: String, appendBangState: Int): ((String, Int), String) = {
  val (addOneState2, n2) = addOne(n).runState(addOneState)
  val (appendBangState2, n3) = appendBang(n2.toString).runState(appendBangState)
  ((addOneState2, appendBangState2), n3)

I have to run addOne and appendBang one by one, passing and getting the states and result manually.

Although I found it can return another State, the code is not improved much:

def myAction(n: Int): State[(String, Int), String] = State {  
case (addOneState: String, appendBangState: Int) =>  
  val (addOneState2, n2) = addOne(n).runState(addOneState)  
  val (appendBangState2, n3) = appendBang(n2.toString).runState(  appendBangState)
    ((addOneState2, appendBangState2), n3)

Since I'm not quite familiar with them, just wondering is there any way to improve it. The best hope is that I can use for comprehension, but not sure if that's possible

by Freewind at September 05, 2015 09:42 AM


How are alphabetic characters programmed into a computer?

I'm no cs student, I'm a programmer. I have a couple of questions and a few assumptions that I will make here (correct me if I'm wrong please).

From my understanding is that all the sequences of 1 and 0's that computers execute are just representations of actual data or instructions that tell other hardware on a system what to do like telling a graphic card to change a pixels color on the monitor.

01100001 being the representation of the letter "a" and that this sequence of bits has been chosen by the fathers of computing/ascii or what have you to represent that letter, it could just as well have been some other sequence right?

All computer systems have agreed that those bits mean the letter "a" for all of the computers around the globe to be interoperable, if there were no standards in place for this the internet would be a mess.

What I want to know is: where is the information stored on a computer that tells it that these sequences of bits here mean/represent the character "a"? Is it in the OS or directly on the motherboard or am I just completely wrong?

by Larry Lawless at September 05, 2015 09:40 AM

Why are PTAS-Reductions used to establish APX-Hardness?

What's the intuition for PTAS reductions instead of approximation preserving reductions or something of that ilk? I'd expect APX-Hardness to mean something like "finding a constant factor approximation for this problem is as hard as finding a constant factor approximation to any problem in APX" but having just a PTAS reduction from A to B isn't strong enough to imply a constant factor approx to B can be converted into a constant factor approx of A.

by Mike at September 05, 2015 09:14 AM



What are the clear differences in mips,ppc and arm architectures? [on hold]

I would like to know very clear differences in mips,ppc and arm architecture processors. Also would like to know what is main focus in each architecture,about instruction set,power consumption,speed in processing,processor design,which sector of industry its used.

by prashantb j at September 05, 2015 08:01 AM

What is the difference (if any) between transition systems and finite automata?

is there any difference between transition systems and finite automata? Is it that transition systems consist of both NFA (nondeterministic finite automata) and DFA (deterministic finite automata)?

by ishaarora95 at September 05, 2015 07:55 AM



Is this language LL(1) parseable?

I tried to find a simple example for a language that is not parseable with an LL(1) parser. I finally found this language.

$$L=\{a^nb^m|n,m\in\mathbb N\land n\ge m\}$$

Is my hypothesis true or is this language parseable with an LL(1) parser?

One can use this simple grammar to describe $L$ (of course it is isn't LL(1) parseable):

S -> ε
S -> A
A -> aA
A -> aAb
A -> a
A -> ab

by FUZxxl at September 05, 2015 07:21 AM


Need help thinking of modern uses for older memory management systems.

I have a HW problem I'm having trouble with. I need to think of present-day uses for Single User, Fixed, Dynamic and Relocatable Dynamic allocation schemes. But for whatever reason I can't think of any.

Like I understand Single User only loads one job at a time.

Fixed loads multiple jobs into preset sections of memory and is good if the jobs are all similar size, and it helps if you know all the job sizes in advanced.

Dynamic loads jobs into memory wherever they fit but can get fragmented after the initial jobs are loaded.


But for the life of me I can't think of uses for them. Does anyone have some reading I could read to get some ideas?

submitted by Leering
[link] [comment]

September 05, 2015 07:13 AM


Why are computable numbers (in Turing's sense) enumerable?

Why are computable numbers (in Turing's sense) enumerable? It must be very obvious, but I'm currently just not seeing it.

by Michiel Borkent at September 05, 2015 06:47 AM

FPT algorithm for point line cover

In the "Covering Things with Things" paper from Langerman and Morin, they mention the BST-Dim-Set-Cover, which is a FPT algorithm for point-line-cover, at page 666.

The algorithm chooses each point p from a set S and then subdivides it into k-lines.

My question is around what is written in the paper at page 666 regarding where is p taken from: At some point during the text it says "Otherwise, the algorithm chooses an element p ∈ S'". However, further down in the algorithm it says "choose p ∈ S" (without the prime symbol).

Is p taken from S or S' (S prime)?


by testTester at September 05, 2015 06:42 AM


Source of market or security attribute information?

There are many securities and exchanges on platforms like Bloomberg and Quandl, but many securities are described with the relevant pit close times and pit open times, exchanges, related futures, and settlement procedures.

Is there a reliable source where people go to find these things out? Kind of like a one stop shop.

by user1367204 at September 05, 2015 06:41 AM

Performance of Open Source Time Series Database for Financial Market Data

We would like to store financial tick data in a database (potentially billions of rows) and then create aggregated (open-high-low-close) bar data from it (e.g. 1min or 5min bars).

It was mentioned to us that a NoSql or time series database might be a good choice for this. Can anybody give any advice on which open source product might fit this requirement best.

Note: query performance is very important for us.

In our research we came across the following products (maybe there are more):

  • InfluxDB
  • OpenTSDB
  • KairosDB
  • MonnetDB

We did run a test with InfluxDB with around 10 mio ticks. Unfortunately the creation of 1min bars was 3-5 slower than with a relation database (i.e. MySql).

We are aware that KDB now offers a free 32-bit version, but unfortunately 32-bit will not be enough for our use case.

Any advice is appreciated.

by Andy Flury at September 05, 2015 06:38 AM


Read a single character or byte from stdIn without waiting for newline in SML

I encountered a problem while working with the TextIO structure, because every input waites for a newline chacter and for the buffer to be full... How can i work with BinIO and stdIn to solve that problem?

Any helpfull input is appreciated.

BTW: I am using MLTton so there is nothing more than the usual standard libs.

by Flux at September 05, 2015 06:22 AM


I am having trouble understanding (and implementing) logistic regression for classifying into three classes

(For reference, i am using Kevin P Murphy's Book "Machine Learning: A Probabilistic Perspective" and implementing with MATLAN - without any toolboxes)

I have a dataset with 392 samples (rows), each sample has 8 features (columns), one of which defines the class (i.e. column 1 of features is divided into three equal bins which define the three classes - low, medium, and high).

I am having a really hard time understanding how to create a logistic regression model to classify a sample into one of these three classes.

I just finished learning and making a linear regression model where I learned about both the Ordinary Least Squares (Closed form) solution for the weight vector, and also Gradient Descent (Open Form) solution. But i never implemented gradient descent because my data was fitted perfectly fine with the OLS solution for weight vector.

I am extremely confused how to create a weight vector for Logistic regression, I understand that it requires use of Gradient Descent because there is no closed form solution. I also read about the Newton method for calculating the weights but I don't understand it at all.

And after you use these methods to calculate weights how do you apply the weights to the sample data? In Linear regression it was simply because you simply multiplied the weights by the features (and higher order features for higher order linear regression), but is it the same in logistic regression?

Moreover my understanding so far is that this model only works for binary classification, so how would i do it for three classes?

Basically my question boils down to this:

How exactly do you find the weight vector for logistic regression (using either gradient descent or newtons method, whichever is easier) and how do you apply the weight vector to the sample to get a classification out of it for 3 classes (not just two).

by user2237160 at September 05, 2015 05:44 AM




LPs with "sparse solutions"

Consider a graph optimization on a graph with n vertices and m edges that can be written as an LP (like say bipartite matching). By the duality with vertex cover, we know that there's a sparse dual solution consisting of the elements of the cover. It's sparse in the sense that its size is linear in n, rather than in m.

Consider the problem of finding the minimum enclosing ball of a set of $n$ points in $d$ dimensions. This is an LP-type problem with (again) a sparse dual solution: a set of $d+1$ points that define the optimal ball. Here, "sparse" means that the size depends on the (smaller) parameter $d$ rather than $n$

In general, suppose I have a problem that can be expressed via an LP with $n$ variables and $m$ constraints, where $m \gg n$. Are there classes of such problems for which any optimal solution to the problem can be expressed using the intersection of only $O(n)$ constraints (or dually with only $O(n)$ nonzero dual variables ?

by Suresh Venkat at September 05, 2015 05:13 AM


Global minimum when noisy estimates of function available

I am interested in knowing about algorithms for finding global minimum when noisy estimates of the function are available.

by RIchard Williams at September 05, 2015 04:37 AM


Construction of a graph which has regular subgraphs at each iteration of a recursive process

I am studying Graph Isomorphism and also trying to figure out the complexity of a certain class of graph. The graph I am studying at the moment is described below

Description :

$G$ is a $r$ regular graph, $k$ connected( not a complete , cycle graph). A vertex of $G$ is $x_1$. All vertices which are not adjacent to $x_1$ create a sub-graph $C_1$. All vertices adjacent to $x_1$ create a sub-graph $, D_1 $. A vertex of $D_1$ is $x_2$.

Using same method, based on adjacency of $x_2$ , $D_1$ can be divided.

All vertices which are not adjacent to $x_2$ create a sub-graph $C_2$.

All vertices adjacent to $x_2$ create a sub-graph $, D_2 $. In general , $ D_{y-1} $ is a graph and can be divided/ partitioned in to 2 sub graphs $C_y, D_y $ .

There are 2 restrictions, they are-

  1. $C_y, D_y $ are $s_y , t_y > 0 ; s_y \neq t_y; $ regular graphs respectively for all iteration $y$

  2. $C_y, D_y $ cannot be complete bipartite graph (utility graph), complete graph or disjoint union of complete graphs.

    $G$ can be divided/ partitioned maximum $\log_2(|G|)$ times , using this dividing process recursively if $G$ is a 3 clique free graph , this condition implies $t_y \leq |D_y|/2$.

Questions :

Does there exist such graph in current literature as described above(follows restriction 1 or 2 or both)?

Motivation : An algorithm runs faster than existing results for this kind of graphs.

Related papers:

  1. V.A.Taskinov. Regular subgraphs of regular graphs. Sov.Math.Dokl.26(1982), 37-38 . In this paper he proved the following : Let $0<k<r$ be positive odd integers. Then every $r$-regular graph contains a $k$-regular subgraph (Here, the graph need not be simple).

  2. N. ALON,S. FRIEDLAND,G. KALAI. Regular Subgraphs of Almost Regular Graphs. JOURNAL OF COMBINATORIAL THEORY, Series B 37, 79-91 (1984)

    Any advice will help.

by Jim at September 05, 2015 04:18 AM




Covariance Not Working Properly in R

I'm trying to run the following code in R

 X6=matrix(c(1,18.95,2,19,3,17.95,3,15.54,4,14,5,12.95,6,8.94,8,7.49,9,6,11,3.99), 10, 2, byrow=TRUE)
 (9/10)*cov(X6, method="pearson")

This gives me the matrix

            [,1]      [,2]
    [1,]   9.5600 -15.93920
    [2,] -15.9392  27.76893

But it should be giving me something close to

        [,1]      [,2]
   [1,]  3.09  -15.94
   [2,] -15.94  5.27

Whats the deal here?

by pretzelman at September 05, 2015 02:53 AM


16. Uk. Want to make a company when im older, need advice.

• How do i build a multi million pound business? Heres my plan

  1. Go to oxbridge or any other uni

  2. Within 1 to 2 years use knowledge from degree e.g. natural language processing to make an app/program/startup

  3. Someeehow get funding

  4. ??? Profit

Reddit, is this realistic? If not, how can i reach my dream of being, i dont know the next mark zuckerberg or whatever. Is it possible, if not could you guys give me a sense of direction/advice.

P.s. please dont kill my dream, thanks!

submitted by Brand248
[link] [1 comment]

September 05, 2015 02:06 AM


A bot to cleanup a map

We're given a robot that's able to sense its immediate surroundings on the north, south, east and west, has space to store its state 0-100, and a room for the robot to cleanup, with obstacles in it. The robot has a fixed set of instructions that tell it which way to go and which state to go to based on the current state and surroundings. I'd like to find a way to cleanup any room that is one connected space completely, no matter where the robot is in the room to start. It's ok if the algorithm isn't efficient. As soon as the bot walks on another cell, the cell is cleaned.

by Richard Xu at September 05, 2015 01:50 AM



Need help designing a data structure

I have to design a data structure That's essentially a half table, not unlike Unity's collision matrix. I have a set of strings, two strings would get passed to the structure, as the indexes of an object; the problem I'm having is that something like a 2D array leaves me with a lot of duplicates (e.g. ["something","nothing"] and ["nothing","something"] would be 2 different objects). I don't want the order of the index's to matter. Ignoring order is easy with something like integers because I can just multiply them and use a 1D since the order wouldn't matter.

I know it's obvious with integer indexes because It's so much easier to OR two integers and get the bit flags (just as the unity collision matrix is really just setting bit flags).

I've spent a good couple days nailing my fingers to the keyboard trying to figures this out and research it, but I quite frankly can't even come up with what the name of such a table would even be! So a side question is: Does anyone here know what such a table (see the unity collision matrix) would be called, where the rows and columns use the same indexing, but the column is in the opposite order of the rows so there's not the duplicate crosses.

I know I'm probably making very little sense here, I've been up way too many hours (workaholic, love my job).

Edit: Thank you /u/program_the_world and /u/asthasr for pointing out where I went wrong in approaching this, I can't believe I didn't think about hashing.

submitted by Destects
[link] [16 comments]

September 05, 2015 01:11 AM



Mozillas Bugtracker ist gehackt worden, anscheinend ...

Mozillas Bugtracker ist gehackt worden, anscheinend mehrere Monate lang. Naja, denkt ihr euch jetzt vielleicht, na und, das ist doch eh alles öffentlich. Nein, ist es nicht. Security-Bugs sind nicht öffentlich. Und genau die sind da jetzt rausgetragen worden.

Wie das halt so ist, wenn man Security-Kram nicht sofort fixt, sondern da glaubt, noch ein paar Wochen bis Monate drauf herumsitzen zu können. Merkel-Politik ist halt immer Scheiße, in der Politik wie bei Softwareprojekten. Aussitzen funktioniert nicht.

September 05, 2015 01:00 AM


What is the correlation of stock options?

I want to calculate the VaR of two correlated option positions, and I know the correlation between stock price returns. I want to separately calculate $Var_1$,$Var_2$ for option 1 and 2, and then use $$\sqrt{(w_1var1)^2+(w_2var2)^2+2*corr*w_1*w_2*var_1*var_2}$$ to calculate the portfolio var, but I don't know if the option correlation is the same with its underlying stock returns correlation?

by ZHI at September 05, 2015 12:41 AM

Sources of index data (MSCI, FTSE, S&P etc.)?

Who are the major suppliers of index data that cover multiple index providers, e.g. MSCI, FTSE, S&P etc?

There are a huge number of people sourcing e.g. equity data, but index data is much harder to come by.

I am only looking for very high level information and at low frequency - the EOD number for each index would almost be enough.

Basic information on the constituents and their weights would also be useful, but in depth analysis and such like of the indexes and their constituents isn't required.

Sites like Yahoo Finance just seem to cover a small number of major indexes, whereas I'm interested in sector indexes and such like that aren't commonly covered, e.g. MSCI World Telecommunication Services.

I am looking for a provider with historical data (going back e.g. 10 years) who will also provide data day-by-day going forward.

Currently I work with a data aggregator who, before they supply particular data, requires us to negotiate agreements directly with the individual index providers, e.g. MSCI or FTSE, rather than handling this on our behalf.

They also provide the data in a highly non-standardized form, e.g. data from MSCI will come in a different form to that from FTSE.

And they only sell the data in very non-flexible bundles, e.g. rather than just buying basic EOD numbers one has to buy a hugely expensive package of in-depth analysis of which these numbers are a tiny part.

It's not that I don't expect to pay for this data - but it would be nicer if possible to pay for just the basic data needed and to have the negotiation with the actual index providers handled behind the scenes by the data provider themselves.

Apologies if you feel this, as a rather commercial question rather than something interesting on e.g. option pricing, isn't directly appropriate to this stack exchange.

by George Hawkins at September 05, 2015 12:38 AM


Why is least fixed point (lfp) important in program analysis

I am trying to get a big picture on the importance of least fixed point (lfp) in program analysis. For instance abstract interpretation seems to use the existence of lfp. Many research papers on program analysis also focus heavily on finding least fixed point.

More specifically, this article in wikipedia : Knaster-Tarski Theorem mentions that lfp are used to define program semantics.

Why is it important? Any simple example helps me. (I am trying to get a big picture).

by Ram at September 05, 2015 12:27 AM

HN Daily

September 04, 2015


Gamma Imbalance Explanation

Can someone please give me an explanation as to what put-call gamma imbalance specifically refers to (imbalance of what?), and why they may exacerbate volatility from a market perspective, and why the risk increases on option expiry day?

My guess is it that these imbalances force option sellers/dealers to more aggressively delta-hedge their positions, but I would like to have a better grasp on what is really going on.

Some confustion is coming from the fact that I thought put and call gamma must be equal else put/call parity would be violated based on put delta + call delta = 1 for a given option strike.

by user7889 at September 04, 2015 11:50 PM


Why is ${n \choose 2}$ a $\Theta(n^2)$ computation?

In Cormen's Algorithms textbook, the author casually mentions that ${n \choose 2}$ is $\Theta(n^2)$.

Why is this so? In the calculation

$$ {n \choose 2} = \frac{n!}{(n-2)!2!} $$

there are $n$ multiplication operations in the numerator and $n$ multiplications in the denominator. That means there are at most $2n+1$ computations to here make (one for the division step), no?

by PP121 at September 04, 2015 11:32 PM


Can a large OpenInt of calls cause a stock to go down?

I read forum post from another site. Which stated...

XYZ has 60k+ CALL contracts are in the money for this Friday vs 5k PUTS.
That will be extra 5,500,000 supply this Friday.
Big drop!
Chances of XYZ being green tomorrow = 0%
4 mln shares supply (20% daily volume) from OPEX imbalance will accelerate profit taking run.

My question is... is this a technically valid statement? if so why?... What happens to the underlying when someone closes or exercises an in the money call that causes an addition to the daily volume of a stock?


by cube at September 04, 2015 11:09 PM


Is there a formal CS definition of VCS and file versions?

I don't know whether it was a joke, but once I read what was referred to as a formal definition of a file in a versioning system such as git, hg or svn. It was something like a mathematical object like a homeomorphism. Was that a joke or is there really computer science theory about versioning systems and the mathematics of VCS?

by Programmer 400 at September 04, 2015 10:38 PM

Quantum computing roadmap

I have to create a roadmap for the quantum computing technology. Looking around I found the timeline on wikipedia that is pretty wide but does not highlight the key events in quantum computing research neither sets the possible future for research.

Could somebody help me to define which are key events in the quantum computing fields? When (and why) we started to explore this technology, for which milestones we passed through and (maybe) what we can set as future milestones?

by Ste at September 04, 2015 10:38 PM


Utility Theory - Certainty equivalent approximation formula derivation

I have a question on an exercise from chapter 9 of D. Luenberger, Investment Science, International Edition, where I suspect there may be a typo.

Exercise 8 (Certainty approximation)

There is a useful approximation to the certainty equivalent that is easy to derive. A second-order expansion near $\bar x=E(x)$ gives $$U(x)\approx U(\bar x)+U^{'}(\bar x)(x-\bar x)+\frac12U^{''}(\bar x)(x-\bar x)^2$$

Hence, $$E[U(x)]\approx U(\bar x)+\frac 12 U^{''}(\bar x)var(x)$$ On the other hand, if we let c denote the certainty equivalent and assume it is close to $\bar x$, we can use the first-order expansion $$U(c)\approx U(\bar x)+U^{'}(\bar x)(c-\bar x)$$ Using these approximations, show that $$c \approx \bar x+{U^{''}(\bar x) \over U^{'}(\bar x)}var(x)$$

Now, I used general methods of algebra along with the fact that $E[U(x)] = U(c)$ to show directly that $$c \approx \bar x+\frac 12 ({U^{''}(\bar x) \over U^{'}(\bar x)})var(x)$$ as follows:

Take the third equation and transform it into $$ c\approx \bar x + {U(c)-U(\bar x) \over U^{'}(\bar x)}$$ Now all I have to do is show that the numerator in the fraction part is $\frac 12 U^{''}(\bar x)var(x)$ which is done by putting $E[U(x)] = U(c)$ into the second formula and you can see the result is immediately there.

On top of this work, I wrote out an example of an investment with the log utility function and showed that my approximation for c worked whereas the book's formula without the "2" didn't.

However, I would like to post all this here just to verify that this is a typo from the book and not some misunderstanding on my part.

Thanks in advance for any feedback.

by James at September 04, 2015 10:10 PM




What is the opposite of screen-scraping?

I'm familiar with the idea of Screen-scraping - getting data from websites where there's no nice feed to use.

Is there a name for the opposite i.e. when your program simulates a human inputting data to a site?

submitted by tpedar50
[link] [11 comments]

September 04, 2015 09:53 PM


Segment Tree, with queries and range update [on hold]

How to query and range updates for LCM in an array provided they can be large so return answer %10^9+7

for Example the queries are of type

0 L R -- find LCM of elements from L to R (Both inclusive), return ans%10^9+7

1 P V -- Update the element at index P with value V

(1-based indexing)..

Say Like INPUT 6(no. of elements)

1 3 2 5 15 3

6(no. of queries)

0 1 6

1 3 5

0 2 3

1 5 7

0 3 5

0 1 6

OUTPUT 30 15 35 105

by breakit at September 04, 2015 09:40 PM


Program reasoning about own source code

The inspiration for this question is the following (vague) question: What are the programming language/logical foundations for having an AI which could reason about its own source code and modify it?

This isn't at all rigorous, so here is my attempt to extract a concrete question out of it. There are two things I'm interested in:

(A) A programming language P that can represent and manipulate its own programs as a datatype Program (e.g., as an AST). (If desired, a object of type Program can be converted into a String, which is the text of a valid program in that language. This would be the opposite of what a compiler does.)

(B) A method to reason about what a program in language P does. Here are two levels I'm thinking about it:

  1. Another language Q (with theorem proving capabilities) which models what a P-program does. It should be able to express and prove statements like "the outcome of running Program p is foo."
  2. A way to reason about what a program p:Program does in the language P itself. (So we are taking P=Q above.)

To what degree has something like this been implemented, or what is progress in this direction? What are the practical obstacles? In light of the original intention of the question, what is the best way to formalize the problem?


As the answers show (thanks!), both (A) and (B1) can be done separately, though it seems doing them together is more of a research question.

Here were some of my first thoughts on the question (warning: rather vague). See also my comments on Martin Berger's answer.

I'm interested in the programming language modeling the same programming language, rather than a simpler one (so P=Q above). This would be a "proof of concept" of a program being able to "reason about its own source code." Dependently typed programming languages can give guarantees about the outputs of its functions, but this doesn't count as "reasoning about its own source code" any more than a "Hello world!" counts as a quine in a language that will automatically prints out a naked string---there needs to be some kind of quoting/self-reference. The analogue here is having a datatype representing Program.

It seems like a rather large project - the simpler the language, the harder it is to express everything within it; the more complicated the language, the more work has to be done to model the language.

In the spirit of the Recursion Theorem, a program can then "get" its own source code and modify it (i.e., output a modified version of itself). (B2) then tells us that the program should be able to express a guarantee about the modified program (this should be able to recurse, i.e., it should be able to express something about all future modifications-?).

by Holden Lee at September 04, 2015 09:33 PM


Compile with F5

I have been trying to switch from vim to emacs + evil for the past week, so I'm very new with all this stuff.

I'm trying to bind F5 to compile a cpp file with a specific shell command:

(defun c-save-compile () (interactive) (save-buffer) (defvar user-foo) (setq user-foo (concat "g++ " (buffer-name) " -g -pg -std=c++0x -o a.out" )) (shell-command user-foo) ) (add-hook 'c++-mode-hook (lambda () (define-key c++-mode-map (kbd "<f5>") 'c-save-compile))) 

this is not working, when I press f5 on a cpp buffer I only get <f5> is undefined.

Can anyone explain to me what's going on?


submitted by Ursao
[link] [13 comments]

September 04, 2015 09:14 PM


Logout from jail

I'm trying to use FreeNAS on my office along with Syncthing, and some things are giving me headache.

FreeNAS installed Syncthing as a jail, and I did the jexec command to login into the jail, but now I can't figure out a way to login back into my server. I'm as root@syncthing instead of being in root@freenas. How do I undo the jexec on the shell?

by SHiRKiT at September 04, 2015 08:59 PM


Using an older version of data structures and algorithm in java [on hold]

Has data structures and algorithms changed during the pass 13 years in java? Would i be able to learn the concepts from a textbook published in 2002 called Data Structures and algorithms in java by Robert Lafore?

by Rivera0527 at September 04, 2015 08:40 PM

How to determine number of instances for resources in a resource allocation graph? [duplicate]

I am struggling to determine the number of instances a resource will have. I have performed extensive research on this, but unfortunately have come up short. I am tasked with drawing a resource allocation graph and I have been provided with the number of processes which is 3 and the of resources which is 5. I have concluded that when drawing the resource allocation graph, I will need to know the number of instances of a resource. Please could someone just help in how I would go about working out or determining the number of instances a resource will have?

by SilverFear at September 04, 2015 08:39 PM

What is the best average-case scheduling policy for a multiprogrammed batch system?

Out of the various scheduling policies, which would be the best recommended scheduling policy for optimizing a multiprogrammed batch system? I'm looking for a general-case policy, if we are not given information on the jobs or needs. For instance, if there was a general purpose library implementing different policies, what would it make the most sense to default to?

by Osiris93 at September 04, 2015 08:30 PM


Historical volatility from close prices (Haug pg 166)

I have implemented a function for calculating historical volatility using close the close method as described by Haug on page 166.

When I implemented the formula given by Haug, it resulted in some negative values for the variance. The data I am using is not suspect, so the fault must lie in either:

  • my implementation or
  • the formula itself.

Here is a rough sketch of my implementation function (in Python)

# close_prices is a time sorted list of pricedata objects
# returns a list of calculated volatilities
def calc_volatility(close_prices, period):
    startpos, stoppos, raw_vols = (0, period, [])
    while True:
        subset = close_prices[startpos:stoppos+1]
        period_returns      = [ subset[i].close_price/subset[i-1].close_price for i in range(1,len(subset)) ]
        log_returns         = [ math.log(x) for x in period_returns ]
        log_squared_returns = [ math.log(pow(x,2)) for x in period_returns ]

        sum_squares_1 = sum( log_squared_returns ) / (period-1)
        sum_squares_2 = pow( sum( log_returns ), 2) / (period * (period -1))

        variance = sum_squares_1 - sum_squares_2
        print "ss1: {0} - ss2: {1}. Variance: {2}".format(sum_squares_1, sum_squares_2, variance)

        volatility = math.sqrt(variance)

        raw_vols.append (volatility)

        startpos += period
        stoppos  += period

        if stoppos >= len(close_prices):

    return raw_vols

Is there something wrong in my implementation, or is the formula I am using, incorrect?

by Homunculus Reticulli at September 04, 2015 08:28 PM


For every 'evil' regex, does there exist a non-evil alternative, or is the devil in the grammar?

Apparently, ReDos attacks exploit characteristics of some (otherwise useful) regular expressions ... essentially causing an explosion of possible paths through the graph defined by the NFA.

Is it possible to avoid such problems by writing an equivalent 'non-evil' regex? If not (thus, the grammar can't be handled in practical space/time by an NFA), what parsing approaches would be better? Why?

by David Bullock at September 04, 2015 08:24 PM




AWS Webinars – September, 2015

I have always advised my family, friends, colleagues, and audiences to plan to spend some time learning something new every day. If you don’t take responsibility for your own continued education, you can soon find that you have fallen far behind your peers!

If you are interested in staying current with AWS, I have a couple of recommendations. You can read this blog, follow me (and @awscloud) on Twitter, check in with the AWS What’s New page from time to time, and make use of our self-paced labs.

When you are ready for a more in-depth look at a certain topic or service, I would strongly advise you to attend one or more of our webinars!

Every month my colleagues assemble a lineup of sessions that are hand-selected to provide you with a detailed look at a subject that they believe will be of value to you. The webinars are presented by senior members of the team including evangelists, product managers, and solution architects.

With that said, I would like to introduce you to our September 2015 webinars (all times are Pacific).

There’s no charge and no obligation, of course! Please feel free to sign up, and to share this post wiht your friends and colleagues.


PS – Please feel free to suggest new topics by leaving a comment on this post.

by Jeff Barr at September 04, 2015 07:51 PM

DragonFly BSD Digest

CDBUG: Patrick Muldoon on DNS, 2015/09/08

CDBUG is having a presentation on DNS, given by Patrick Muldoon, on Sept. 8th.  That’s next Tuesday.  If you are anywhere near Albany, go visit.

by Justin Sherrill at September 04, 2015 07:43 PM


Is it possible to prove thread safety?

Given a program consisting of variables and instructions which modify these variables, and a synchronization primitive (a monitor, mutex, java's synchronized or C#'s lock), is it possible to prove that such a program is thread safe?

Is there even a formal model for describing things like thread safety or racing conditions?

by Emiswelt at September 04, 2015 07:27 PM

Minimising sum of consecutive points distances Manhattan metric

I have set of points (two dimensions, $X, Y$).
Coordinates are floating point numbers.

Objective is to sort them in such way that sum of differences in distances of consecutive sorted points is minimal.

I tried to solve this problem by localisation (dividing plane by smaller grids of sorted points) and then connecting clusters - but this is greedy and approximate approach - the problem here is in setting grid size, but this is not guarantee to give any bounds on how optimal solution is, and it is rather time consuming (without well defined objective function).

Another attempt was to sort them by X coordinate, than find minimum. But this approach failes.

Normal sorting by two coordinates fails and is highly unstable.

Problem ressembles TSP problem, but "path" is not closed and metric used is L1 (Manhattan, TaxiCab).

Solution I am looking for probably will be approximate.
Finding minimal and second minimal distance from every point and connecting them gives good but for sure not minimal sum.

The problem I am having is with giving objective function.

by EvilJS at September 04, 2015 07:19 PM


Quantum algorithms based on transforms other than Fourier transforms

In Quantum Computation and Quantum Information by Nielsen and Chuang they say that many of the algorithms based on quantum Fourier transforms rely on the Coset Invariance property of Fourier transforms and suggests that invariance properties of other transforms might yield new algorithms.

Has there been any fruitful research on other transforms?

by Sam Burville at September 04, 2015 07:17 PM



xterm-256 only supports 144 colors?

Is there a way to get it to support all 256? Or another terminal to use? I'm working on building a happy dev box with freebsd, but can't get my vim color schemes to load.

submitted by desnudopenguino
[link] [3 comments]

September 04, 2015 06:58 PM

Daniel Lemire

Revisiting Vernor Vinge’s “predictions” for 2025

Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“.

There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future (maybe between 2025 and 2050), we shall get to a point technologically where most human beings cannot make sense of the world anymore. We get a form of technological “hyperinflation“: even if you can make sense of the technology we had last year, you are hopelessly lost when looking at this year’s technology. Historically, we have almost always experienced accelerating technological progress… there are more new inventions and innovations in 2015 than any other year in our history… At some point, the rate of change might get too fast for most of us. Vinge chose the term “singularity” to describe such an event, not because we become infinitely wealthy, or machines become infinitely intelligent, but because, as in a black hole, it is an event that you cannot “see” even as you get closer… Your models of reality break down at that point.

In his 2006 novel, Rainbows End, paints a world going through such singularity. It is in our near future too (2025). In this world, information technology has made such progress that anyone who hasn’t kept up-to-date for the last ten years, is hopelessly obsolete… as if, say, you tried to live in 2015 through the lens of someone from 1970. Any intellectual asleep from 1970 to today would need to go back to basic training… learn how to use a computer, maybe starting from a PC. This may require high-school-level retraining. Now imagine that the next ten years (hypothetically) bring about as much change as we underwent in the last 35 years. Most people under 20 today would adapt, though an increasing numbers would fall to the side. However, most people over 50 today might end up being pushed aside, as the rate of change becomes faster than they can cope with.

Writing near-future science-fiction is very difficult as it becomes almost instantly obsolete. And you will find very little near-future science-fiction for this very reason. But Vinge was no doubt well aware of this challenge and he is a smart fellow, so maybe his “predictions” for year 2025 are going to hold out better than, say, mine would.

Let me review some of his predictions:

  • In his novel, many people earn a small income through informal part-time work with affiliate networks, doing random work. Today you can earn a small income through Amazon’s Mechanical Turk and there are many Uber-like services whereas individuals can earn small sums by doing various services. So this prediction is almost certainly coming true.
  • In his novel, we have ubiquitous high-bandwidth networking and zero privacy in 2025. That is, everyone is connected all the time, and hackers can track individuals easily. Given that your smart phone is tracking your every move already, this does not sound improbable.
  • In Rainbows end, we have ubiquitous augmented reality. You can “almost” appear to be at a meeting by projecting your image at a remote location so that people who are equipped with the proper gear can see you and get the impression that you are present. You can virtually repaint your house every day, and people with the right gear can see it as such. This is shown to have applications in gaming, education and medicine. Sadly, we have nothing close to that right now, and it is probably the most disrupting technology in the novel. We had Google Glass but the whole field of virtual/augmented reality has not been making much progress. Can we expect breakthrough in the next ten years? Maybe. My understanding is that augmented reality is much harder to achieve that it appears. On a related note, hardly anyone has a PC in 2025. I have been (wrongly) predicting the death of the PC for many years now… with no luck… but if we get ubiquitous augmented reality, then a PC would really look old-school.

    In any case, why would ubiquitous augmented reality be so disrupting? Because it draws a sharp line between those who are in it and those who are not. If businesses were to begin relying on augmented reality, employees would have to pick it up quickly or be left behind.

  • In Vinge’s 2025, people can send silent messages without talking or (openly) typing. It does feel like the kind of application that could take off.
  • Ving predicts self-driving cars that can take you somewhere, drop you off and go on to serve other needs. I think Google has shown that it is possible, soon.
  • There is no newspaper, not even the online equivalent. People get news somehow though.
  • It looks like neurodegenerative diseases have mostly been cured. The main character suffered from Alzheimer’s and came back. It is, of course, impossible to tell whether such a medical feat could happen in the next ten years… there are certainly many smart scientists trying to make it happen. However, obesity is still a problem and you can still die from a stroke.

    Many things are related in this respect. Better technology can help us better understand our body and our brain, but a better understanding of our brain could also help us build better technology (better AI). But we have to remain humble: we have known about Alzheimer’s disease for a century and we still have nothing that even looks like a cure from a distance. Naturally, these problems tend to be binary, one day you have no effective treatment, the next you do.

There are more surprises in the novel, but I do not want to include spoilers.

It is maybe interesting to note that while 2006 is not that far in the past, it is also before the first iPhone ever made it to the market. The first iPad came in 2010. Android did not exist in 2006. I think that Vinge had to make a real effort to imagine 2025 and it looks like he did well.

Vinge has predicted in public the singularity for 2030. The problem, of course, is that with such an ill-defined concept, we could say, if we wanted to, that we are going through the singularity right now. I am far more interested to know when augmented reality will be ubiquitous.

by Daniel Lemire at September 04, 2015 06:49 PM


What is the meaning of "coadd pixel"?

I keep coming across "co-added" or "coadd" pixels, e.g.

"A set of coadded and stored pixels obtained at a specific time is referred to as a cadence, and the total amount of time over which the data in a cadence is coadded is the cadence period. The two cadence periods in use are Long Cadence (LC) and Short Cadence (SC). Each cadence consists of a series of frames that each include a 6.02 s exposure time and a 0.52-s readout time. For Long Cadence, 270 frames are coadded, for a total of 1766 s = 0.49 h. For Short Cadence, 9 frames are coadded, for a total of 58.85 s. "

What does this mean? Is this something only used in astronomy?

submitted by Zeekawla99ii
[link] [17 comments]

September 04, 2015 06:39 PM


New AWS Quick Start – Magento for E-Commerce

Magento is a very popular open-source content management system for e-commerce sites. Sellers and developers appreciate its open architecture, flexibility, extensibility (hundreds of extensions), and back-end workflows that can be tailored to fit the unique needs of each customer.

Magento Community Edition (Magento CE) Magento Enterprise Edition (Magento EE) are popular among our customers. Some of these deployments have been launched on AWS by way of partners such as Anchor Hosting, Elastera, Tenzing, Razorfish, and Optaros. Others have been launched via the AWS Marketplace (a search for Magento returns more than 20 listings). And still others have been launched in do-it-yourself form.

Today we are publishing a new Magento Quick Start Reference Deployment. This 29-page document will show you how to build an AWS cluster that runs version 1.9.2 of Magento Community Edition. It walks you through best practices, provides cost estimates, and outlines the recommended set of AWS components.

Using the AWS CloudFormation template referenced in the Quick Start, you can launch Magento into a new or existing Virtual Private Cloud (Amazon VPC). The template will create (if requested) the VPC, along with the necessary EC2 instances (auto scaled instances for the web server and a NAT instance for SSH connectivity), an Amazon Relational Database Service (RDS) instance running MySQL, and Elastic Load Balancing. It will also create the requisite IAM roles and security groups and configure the Auto Scaling to add more EC2 instances when traffic rises and remove them when it subsides. Here’s what the completed system looks like:


The Quick Start also includes a pointer to some sample data that you can download from the Magento site!





by Jeff Barr at September 04, 2015 06:38 PM


Binary code with constraint

Suppose I have an alphabet of n symbols. I can efficiently encode them with $\lceil \log_2n\rceil$-bits strings. For instance if n=8:
A: 0 0 0
B: 0 0 1
C: 0 1 0
D: 0 1 1
E: 1 0 0
F: 1 0 1
G: 1 1 0
H: 1 1 1

Now I have the additional constraint that that each column must contain at most p bits set to 1. For instance for p=2 (and n=8), a possible solution is:
A: 0 0 0 0 0
B: 0 0 0 0 1
C: 0 0 1 0 0
D: 0 0 1 1 0
E: 0 1 0 0 0
F: 0 1 0 1 0
G: 1 0 0 0 0
H: 1 0 0 0 1

Given n and p, does an algorithm exist to find an optimal encoding (shortest length) ? (and can it be proved that it computes an optimal solution?)


Two approaches have been proposed so far to estimate a lower bound on the number of bits $m$. The goal of this section is to assess which of the approaches provides the best lower bound.

Yuval's approach is based on entropy and provides a very nice lower bound: $\frac{logn}{h(p/n)}$ where $h(x) = xlogx + (1-x)log(x)$.

Alex's approach is based on combinatorics. If we develop his reasonning a bit more, it is also possible to compute a very good lower bound:

Given $m$ the number of bits $\geq\lceil log_2(n)\rceil$, there exists a unique $k$ such that $$ 1+\binom{m}{1} + ... +\binom{m}{k} \lt n \leq 1+\binom{m}{1} + ... + \binom{m}{k}+\binom{m}{k+1}$$ One can convince himself that an optimal solution will use the codeword with all bits low, then the codewords with 1 bit high, 2 bits high, ..., k bits high. For the $n-1-\binom{m}{1}-...-\binom{m}{k}$ remaining symbols to encode, it is not clear at all which codewords it is optimal to use but, for sure the weights $w_i$ of each column will be bigger than they would be if we could use only codewords with $k+1$ bits high and have $|w_i - w_j| \leq 1$ for all $i, j$. Therefore one can lower bound $p=max(w_i)$ with $$p_m = 0 + 1 + \binom{m-1}{2} +... + \binom{m-1}{k-1} + \lceil \frac{(n-1-\binom{m}{1}-...-\binom{m}{k}) (k+1)}{m} \rceil$$

Now, given $n$ and $p$, try to estimate $m$. We know that $p_m \leq p$ so if $p \lt p_{m'}$, then $m' \lt m$. This gives the lower bound for $m$. First compute the $p_m$ then find the biggest $m'$ such that $p \lt p_{m'}$

This is what we obtain if we plot, for $n=1000$, the two lower bounds together, the lower bound based on entropy in green, the one based on the combinatorics reasonning above in blue, we get:enter image description here

Both look very similar. However if we plot the difference between the two lower bounds, it is clear that the lower bound based on combinatorics reasonning is better overall, especially for small values of $p$.

enter image description here

I believe that the problem comes from the fact that the inequality $H(X) \leq \sum H(X_i)$ is weaker when $p$ gets smaller, because the individual coordinates become correlated with small $p$. However this is still a very good lower bound when $p=\Omega(n)$.

Here is the script (python3) that was used to compute the lower bounds:

from scipy.misc import comb
from math import log, ceil, floor
from matplotlib.pyplot import plot, show, legend, xlabel, ylabel

# compute p_m 
def lowerp(n, m):
  acc = 1
  k = 0
  while acc + comb(m, k+1) < n:
    acc+=comb(m, k+1)

  pm = 0
  for i in range(k):
    pm += comb(m-1, i)

  return pm + ceil((n-acc)*(k+1)/m)

if __name__ == '__main__':
  n = 100

  # compute lower bound based on combinatorics
  pm = [lowerp(n, m) for m in range(ceil(log(n)/log(2)), n)]
  mp  = []
  p = 1
  i = len(pm) - 1
  while i>= 0:
    while i>=0 and pm[i] <= p: i-=1
  plot(range(1, p), mp)

  # compute lower bound based on entropy
  lb = [ceil(log(n)/(p/n*log(n/p)+(n-p)/n*log(n/(n-p)))) for p in range(1,p)]
  plot(range(1, p), lb)


  # plot diff
  plot(range(1, p), [a-b for a, b in zip(mp, lb)])

by user3017842 at September 04, 2015 06:35 PM





Computer Architecture help?

Currently taking Computer Architecture II at my university now. I took CompArch I in community college ~three years ago. Most of the knowledge obtained from that class has been forgotten.

We did a review the first day. Went over a few gates (e.g. AND, NAND, OR, etc...) and a truth tables. No problem.

Started to go over Adders, Multiplexer, ALU, D Latch, D Flip Flop Hrm.. confused.

We just had a quiz and I completely bombed it. I've been looking through the book but it's just not catching on. My professor reads directly off slides, which come from the book, so it's just personally difficult learning off that.

Now we are on processors and I don't want to fall back too far behind!

Any books, youtube videos, help would be appreciated. Thanks.

submitted by odu_mark
[link] [8 comments]

September 04, 2015 05:34 PM



Zelda Chaos not working properly [on hold]

How come when ever I try to create's_Mask_Miscellaneous_Glitches_-_Day_Reset, it won't submit when I click "Save page"? I went to that page later to see if the change got saved without me seeing it but it didn't. I can't contact Zelda Chaos directly about that problem because I don't see a way to contact that site.

by Timothy at September 04, 2015 05:26 PM



Using assertion library outside of testing for list manipulations

In my current project I do a lot of list manipulations on front end, and I realised that using something like chai or other assertion library could save me a lot of time and make code quite readable.

Though while Chaijs is cool and all, it's api is not exactly functional and currying friendly. My questions are:

Is it actually a good idea to look for assertion library to use outside of testing environment for array manipulations/filtering. Or am I smoking crack?

And if yes, any one have success story with any library?

Some of simplified examples of what I'm trying to do:

const people = [
  {name: "David Smith", age: 25, livedIn: ["London", "Paris"]},
  {name: "John Smith", age: 30, livedIn: ["New York"]},
  {name: "John Gates", age 25, livedIn: ["London", "New York"]}

Currently we are using plain closures, and when checks are easy enough, it's all good and shiny:

  // Let's find 25 year olds
  .filter((person) => person.age === 25);
  // And see who is among them john
  .filter((person) =>"John") === 0);

Now that's all plain and easy, but then we need to start testing deep properties: person.names.firstName, or person.cities[<index>].postCode and this is what assertion library does great on paper, chai or jasmine have methods to do just that, example from chaijs documentation:


For example with assertion library I can find people who lived in London:

  .filter((person) => {
    return expect(person.livedIn).to.includes("London");

Or not lived in London:

  .filter((person) => {
    return expect(person.livedIn)"London");

And of course ranges:

expect({price: 50}), 60);

Now while both chaijs and jasmine do functinally what I need, their API is not functional, I would love to have api close to Ramda or i.e. Composable functions, with rules first, and iteratee last. Something along the lines of:

   .filter(within('age', 20, 40)) // Filter age property, to be 20 < age < 40
   .filter(contains('livedIn', 'London', 'Boston')) // Further filter people, by place they lived in.

Of course I don't want or need exactly that API. It can differ in any way as long as I can curry it.

by David Sergey at September 04, 2015 04:53 PM

functional programming - call function with all parameters

I'm actually experimenting the functional programming in php. I would like to have some precisions about some function calls.

as an example:

function addition($num1){
    return function ($num2){
        return $num1*$num2;
$add_2_to = addition(2);
echo $add_2_to(3);
echo $add_2_to(4);

Is there a way to call the addition function with all the parameters? I tried in this way with no chances:

echo addition(2)(3);

by Arnaud Ncy at September 04, 2015 04:52 PM



Find maximum element in sorted arrays in logarithmic time

Been stuck on this for a while, would really appreciate some help:

Suppose you are given an array A[1...n] of sorted integers that has been circularly shifted k positions to the right. For example, [35,42,5,15,27,29] is a sorted array that has been circularly shifted k = 2 positions, while [27,29,35,42,5,15] has been shifted k = 4 positions. Give an algorithm for finding the maximum element in A that runs in O(log n) time.

The elements in A are distinct.

I understand that to achieve O(log n) time I'll probably have to search through the list by starting at the middle, and then going left or right, then splitting the list in half over and over, but I'm not sure how to attack it beyond that.

by user2317714 at September 04, 2015 04:51 PM


Stateful computation with different types of short-circuit (Maybe, Either)

I am trying to find the most elegant way of converting the following stateful imperative piece of code to pure functional representation (preferably in Haskell to use abstraction that its Monad implementation offers). However I am not yet good at combining different monads using transformers and the like. It seems to me, that analyzing other's takes on such tasks helps the best when learning how to do it myself. The imperative code:

while (true) {
  while (x = get()) { // Think of this as returning Maybe something
    put1(x) // may exit and present some failure representation
  put2() // may exit and present some success representation

When get returns Nothing we need the execution to continue with put2, when get returns Just x we want the x to get passed to put1 and short-circuit only if put1 fails or loop otherwise. Basically put1 and put2 may terminate the whole thing or move to the following statement changing the underlying state somehow. get can either succeed and invoke put1 and loop or fail and continue to put2.

My idea was something along:

forever $ do
  forever (get >>= put1)

And why I was looking for something like that is because (get >>= put1) could simply short-circuit whenever get has nothing to return or put1 terminates. Similarly put2 terminates the outer loop. However I am not sure how to mix the State with the necessary Maybe and/or Either to achieve this.

I think using transformers to combine State and the other monads is necessary and thus the code will most probably not be that succint. But I guess it as well might not be much worse.

Any suggestion how to achieve the translation elegantly is welcome. This differs from "Stateful loop with different types of breaks" in avoiding explicit control-flow using if, when, while and rather tries to encourage use of Maybe, Either, or some other handy >>= semantics. Also there is always a straight-forward way how to translate the code into a functional one, however it can hardly be considered elegant.

by Jakub Daniel at September 04, 2015 04:37 PM




High Scalability

Stuff The Internet Says On Scalability For September 4th, 2015

Hey, it's HighScalability time:

An astonishing 300 billion stars in our galaxy have planets. Take a look in the Eyes on Exoplanets app.
  • 1 billion: people who used Facebook in a single day; 2.8 million: sq. ft. in new Apple campus (with drone pics);  1.1 trillion: Apache Kafka messages per day; 2,000 years: age of termite mounds in Central Africa; 30: # of times better the human brain is better than the best supercomputers; 4 billion: requests it took to trigger an underflow bug.

  • Quotable Quotes:
    • Sara Seager: If an Earth 2.0 exists, we have the capability to find and identify it by the 2020s.
    • Android Dick: But you’re my friend, and I’ll remember my friends, and I’ll be good to you. So don’t worry, even if I evolve into Terminator, I’ll still be nice to you. I’ll keep you warm and safe in my people zoo, where I can watch you for ol’ times sake.
    • @viktorklang: "If the conversation is typically “scale out” versus “scale up” if we’re coordination-free, we get to choose “scale out” while “scaling up.”
    • Amir Najmi: At Google, data scientists are just too much in demand. Thus, anytime we can replace data scientist thinking with machine thinking, we consider it a win.
    • @solarce: "don’t be be content that the software seems to basically work — you must beat the hell out of it" -- @bcantrill
    • John Ralston Saul: I have enormous confidence in the individual as citizen. I don't think there is any proof in our 2,500 years of history that the elites do a good job without the close involvement of the citizenry.
    • Joshua Strebel: on average Aurora RDS is 3x faster than MySql RDS when used with WordPress.
    • Martin Thompson: I'd argue that "state of the art" in scalable design is to have no contention. It does not matter if you manage contention with locks or CAS techniques. Once you have contention then Universal Scalability Law kicks in as you have to face the contention and coherence penalty that contented access to shared state/resources brings. Multiple writers to shared state is a major limitation to the scalability of any design. Persistent data structures make this problem worse and not better due to path-copy semantics that is amplified by the richness of the domain model.

  • Mike Hearn in a great interview on a16z, Hard Forks, Hard Choices for Bitcoin, had much to say on the future scalability of Bitcoin. One of the key ideas is that many of the things that people love about Bitcoin are based on Bitcoin's decentralized nature. Characteristics like it is permissionless, that it's the new gold, that it doesn't have a centralized policy committee, that it's a global network, and that it's a platform you can innovate on top of. One of the challenges with the keeping the current block size is that decentralization is already under stress. A certain amount of centralization has crept with ever bigger and bigger miners. With the collaboration of three or four companies they could start to apply some policy influence to Bitcoin and that would erode all the interesting properties that people love about Bitcoin. The challenge is to scale Bitcoin in balance with decentralization. Scaling and security as encapsulated by decentralization are tradeoffs. You can scale massively and lose decentralization and which point Bitcoin becomes Paypal. Yet if you keep the block size the same you make it so Bitcoin can't be used by a world wide audience. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

by Todd Hoff at September 04, 2015 03:56 PM




Proving equality between foldl recursive and iterative fold

Hi I have two definitions of fold. I will call them foldl which is recursive and fold$_{itr}$ which is iterative.

I am looking for an algebraic proof that the two definitions are equal ideally through structural induction.

Definition for foldl

foldl(c,h) nil = c

foldl(c, h) (list, element) = h (foldl (c, h) list, element)

Definition for fold$_{itr}$

fold$_{itr}$ (c, h) nil = nil

fold$_{itr}$ (c, h) (element, list) = fold$_{itr}$ (h(c, element), h) list

by Abdul Rahman at September 04, 2015 03:42 PM


How to show that this exponential utility function is wealth-independent?

I have a question on the following exercise from chapter 9 of D. Luenberger, Investment Science, International Edition.

Exercise 2 (Wealth Independence)

Suppose an investor has exponential utility function $U(x) = -e^{-ax}$ and an initial wealth level of W. The investor is faced with an opportunity to invest an amount $w \le W$ and obtain a random payoff $x$.

Show that his evaluation of this incremental investment is independent of W.

I first considered that the utility after the investment is $-e^{-aW}*e^{-a(x-w)}$ and that this is a factor above the original utility of $-e^{-aW}$ which is independent of W, and then that's the solution.

However, this doesn't really take into consideration the randomness of x. For instance, I know of one possible example where if $x$ takes only 2 values with probability of $\frac{1}{2}$ each then it can be shown that $w$, the absolute value amount to be invested, is the exact same for any initial wealth $W$.

So if it's the case that this question requires a similar result for any random payoff $x$, how would I go about doing that if I don't know the probability function of $x$?

Perhaps I should consider $w$ as a proportion of $W$ and then show somehow that this is equal to some constant $\frac{k}{W}$ when $E(U(x))$ is maximized with respect to $x$.

If this is the approach, how do you differentiate $E(U(x))$ with respect to x?

by James at September 04, 2015 03:31 PM


Would $\sf RP = NP$ imply $\sf NP = coNP$?

If $\sf RP = NP$ then the hierarchy collapses to its second level (by the Karp-Lipton theorem). But what about $\sf NP$ and $\sf coNP$?

I tried to prove that $\sf BPP$ is contained in $\sf NP$ (the other direction is trivial if $\sf RP = NP$) but to no avail, and I'm not even sure that it's true.

What do you think?

by Robert777 at September 04, 2015 03:26 PM


Matlab Portfolio Optimization with bid ask spread

I'm trying to find the optimal portfolio of options and stock which minimizes the standard deviation of the portfolio returns but also taking into consideration the bid and ask prices of the assets. I am given a snapshot of the stock and related options (all with same maturity but different strikes) with their bid and ask prices.

I am making the assumption that the stock price follows a geometric brownian motion and that infinite amounts of each asset can be sold/bought at their bid/ask price.

My approach, if there was only a mid price would be to simulate returns for each asset, calculate the average returns and the covariance matrix then pass this into a quadprog function in MATLAB to minimize the variance of the portfolio with no upper or lower bounds on the quantity of each asset (i.e. short selling is allowed).

However, with the bid/ask spread, the return on selling an asset is slightly different to the return of buying that same asset and so passing the returns and covariances the same way as above would not be accurate.

Can anyone suggest a plausible method to account for this bid/ask spread? I thought about treating the buy and the sell of an asset as two separate assets, calculating their respective returns and altering the constraints to accommodate the buying and selling direction but it gets very messy and my results don't look correct. Is there an easier way to do this?

Thanks, Michael.

by Michael Watters at September 04, 2015 03:25 PM


How is uniqueness of MAC addresses enforced? [migrated]

Again and again I read that devices are uniquely identified by their MAC addresses. But how are MAC addresses determined? (And do they tell us anything about the device which they represent?)

(Sorry if this is the wrong SE site for this question. Please vote for removal to another if you have a recommendation.)

(Further apologies for my choice of tag. Please recommend/create another if appropriate. I thought hardware, uniqueness, or addresses might be suitable, but I lack the privilege of creating new tags.)

by JellicleCat at September 04, 2015 03:25 PM


Liebe Leser, ihr müsst jetzt sehr stark sein. Der ...

Liebe Leser, ihr müsst jetzt sehr stark sein. Der BND hat gelogen. Nein, wirklich! Ich weiß!!
Nach SPIEGEL-Informationen wurden in der Operation "Glotaic" zwischen 2004 und 2006 Telefon- und Faxverkehre des US-Anbieters MCI an dessen deutschem Standort Hilden überwacht. Dabei wurden die Audiodaten abgehörter Gespräche "direkt nach USA geroutet", damit "die Audiofunktion ohne Aussetzer funktioniert", wie es in einem vertraulichen Papier des BND heißt.

Bislang hatten BND-Mitarbeiter im NSA-Untersuchungsausschuss des Bundestags angegeben, alle Gespräche seien vom BND gefiltert und geprüft worden.

Ich sage euch, wenn uns das vorher jemand gesagt hätte, dass uns unsere eigenen Geheimdienste so dreist belügen würden, dann hätten wir die schon vor Jahren zugemacht!1!!

Oh, warte ...

September 04, 2015 03:01 PM

Es gibt jetzt einen Schwarzmarkt für syrische Pässe, ...

Es gibt jetzt einen Schwarzmarkt für syrische Pässe, weil man in Deutschland mehr Chancen auf Asyl hat, wenn man einen syrischen Pass vorweisen kann.

September 04, 2015 03:01 PM




Why is the tape not part of the definition of a Turing Machine?

I've wondered why the tape/tapes are not part of the formal definition of a Turing Machine. Consider, for example, the formal definition of a Turing machine on Wikipedia page. The definition, following Hopcroft and Ullman, includes: the finite set of states $Q$, the tape alphabet $\Gamma$, the blank symbol $b \in \Gamma$, the initial state $q_0\in Q$, the set of final states $F\subseteq Q$, and the the transition function $\delta:(Q\backslash F)\times \Gamma\rightarrow Q\times\Gamma\times\{L,R\}$. None of which is the tape itself.

A Turing Machine is always considered to work on a tape, and the transition function is interpreted as moving its head, substitution of symbol and changing state. So, why is the tape left out of the mathematical definition of a Turing machine?

From what I can see, the formal definition in itself doesn't seem to imply that the Turing Machine operates like it's often described informally (with a head moving around on a tape). Or does it?

by Nicolas Lykke Iversen at September 04, 2015 02:46 PM




Is this permutation-sum problem NP-complete?

A new, tighter tardiness bound has been found for global Earliest-Deadline-First scheduling of jobs on symmetric multiprocessors. But this bound seems to be particularly hard to compute. In particular, we were not able either to find a polynomial-time algorithm to compute this bound, or to prove that the problem is NP-hard. In this respect, a sufficient condition for the problem to be NP-hard, is that the following sub-problem is NP-complete.


  • a set $\tau$ of $N$ pairs $(C_i, U_i) \in \mathbb{Q}^2$, with $C_i > 0$ and $0 < U_i \le 1$ for all $i \in \{1, 2, \ldots, N\}$,
  • the smallest natural number $M$ such that $\sum_{i=1}^{N} U_i \le M$ (which implies $M \le N$, because $U_i \le 1$),
  • an integer $M'$ with $1 \le M' \le M$,
  • a strictly positive value $b \in \mathbb{Q}$;

does a permutation $\tau' = ( (C_{i_1},U_{i_1}), (C_{i_2},U_{i_2}), \ldots, (C_{i_{M'-1}}, U_{i_{M'-1}}))$ of $M' - 1$ pairs in $\tau$ exist, such that $$ \sum_{g = 1}^{M'-1}\frac{C_{i_g}}{M - \sum_{v=1}^{g-1}U_{i_v}} \ge b \quad ? $$

Any suggestion for proving that this is problem is NP-hard, or, on the other hand, for finding a way to solve the problem in polynomial time?

by Paolo Valente at September 04, 2015 01:39 PM



Extracting Default probability from a single CDS

I have to find the CDS's default probability using the simplest Poisson Process (intensity constant).

I'm wondering how to get this estimate if I have only a CDS with maturity 5years.

If I had different maturities I could assume, for example, that the PD (probability of default) related to 1 years is that extracted from the CDS 1 y, then the one related to 2 years is extracted from the CDS with 2-years maturity assuming a pd for the first year equal to the one evaluated before(1year PD) and so on.

So, my question is:

if I have just 1 CDS, as, for instance, the case of iTraxx crossover index, can I compute the PD assuming that this probability remains constant throughout the 5 years?

Do alternative solutions exist?

by sciury at September 04, 2015 01:24 PM

Difference between Sharpe Ratio and Information Ratio

I am finding it difficult to understand the difference between the sharpe ratio and the information ratio and the relationship between the two, and cannot find a decent reference that breaks it down in terms of the actual mathematical definitions of the two. The Wiki page for the two is quite confusing as well, since it defines sharpe ratios almost identically to information ratios, but SR is in terms of a single asset, IR is in term of a portfolio of assets.

update: for clarity, the definiton of sharpe ratio (Wiki): $$ S_a = \frac{E(R_a-R_b)}{\sqrt{Var(R_a - R_b)}} $$ and definition of information ratio: $$ IR_p = \frac{E(R_p-R_b)}{\sqrt{Var(R_p - R_b)}} $$ where a is the reference asset, b is the benchmark and p is the reference portfolio

by dimebucker91 at September 04, 2015 01:20 PM


Alternative proofs of Schwartz–Zippel lemma

I'm only aware of two proofs of Schwartz–Zippel lemma. The first (more common) proof is described in the wikipedia entry. The second proof was discovered by Dana Moshkovitz.

Are there any other proofs which use substantially different ideas?

by Dai Le at September 04, 2015 01:19 PM

Planet Emacsen

Julien Danjou: Data validation in Python with voluptuous

Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named voluptuous.

It's no secret that most of the time, when a program receives data from the outside, it's a big deal to handle it. Indeed, most of the time your program has no guarantee that the stream is valid and that it contains what is expected.

The robustness principle says you should be liberal in what you accept, though that is not always a good idea neither. Whatever policy you chose, you need to process those data and implement a policy that will work – lax or not.

That means that the program need to look into the data received, check that it finds everything it needs, complete what might be missing (e.g. set some default), transform some data, and maybe reject those data in the end.

Data validation

The first step is to validate the data, which means checking all the fields are there and all the types are right or understandable (parseable). Voluptuous provides a single interface for all that called a Schema.

>>> from voluptuous import Schema
>>> s = Schema({
... 'q': str,
... 'per_page': int,
... 'page': int,
... })
>>> s({"q": "hello"})
{'q': 'hello'}
>>> s({"q": "hello", "page": "world"})
voluptuous.MultipleInvalid: expected int for dictionary value @ data['page']
>>> s({"q": "hello", "unknown": "key"})
voluptuous.MultipleInvalid: extra keys not allowed @ data['unknown']

The argument to voluptuous.Schema should be the data structure that you expect. Voluptuous accepts any kind of data structure, so it could also be a simple string or an array of dict of array of integer. You get it. Here it's a dict with a few keys that if present should be validated as certain types. By default, Voluptuous does not raise an error if some keys are missing. However, it is invalid to have extra keys in a dict by default. If you want to allow extra keys, it is possible to specify it.

>>> from voluptuous import Schema
>>> s = Schema({"foo": str}, extra=True)
>>> s({"bar": 2})
{"bar": 2}

It is also possible to make some keys mandatory.

>>> from voluptuous import Schema, Required
>>> s = Schema({Required("foo"): str})
>>> s({})
voluptuous.MultipleInvalid: required key not provided @ data['foo']

You can create custom data type very easily. Voluptuous data types are actually just functions that are called with one argument, the value, and that should either return the value or raise an Invalid or ValueError exception.

>>> from voluptuous import Schema, Invalid
>>> def StringWithLength5(value):
... if isinstance(value, str) and len(value) == 5:
... return value
... raise Invalid("Not a string with 5 chars")
>>> s = Schema(StringWithLength5)
>>> s("hello")
>>> s("hello world")
voluptuous.MultipleInvalid: Not a string with 5 chars

Most of the time though, there is no need to create your own data types. Voluptuous provides logical operators that can, combined with a few others provided primitives such as voluptuous.Length or voluptuous.Range, create a large range of validation scheme.

>>> from voluptuous import Schema, Length, All
>>> s = Schema(All(str, Length(min=3, max=5)))
>>> s("hello")
>>> s("hello world")
voluptuous.MultipleInvalid: length of value must be at most 5

The voluptuous documentation has a good set of examples that you can check to have a good overview of what you can do.

Data transformation

What's important to remember, is that each data type that you use is a function that is called and returns a value, if the value is considered valid. That value returned is what is actually used and returned after the schema validation:

>>> import uuid
>>> from voluptuous import Schema
>>> def UUID(value):
... return uuid.UUID(value)
>>> s = Schema({"foo": UUID})
>>> data_converted = s({"foo": "uuid?"})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']
>>> data_converted = s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
>>> data_converted
{'foo': UUID('8b7ba51c-dff5-45dd-b28c-6911a2317d1d')}

By defining a custom UUID function that converts a value to a UUID, the schema converts the string passed in the data to a Python UUID object – validating the format at the same time.

Note a little trick here: it's not possible to use directly uuid.UUID in the schema, otherwise Voluptuous would check that the data is actually an instance of uuid.UUID:

>>> from voluptuous import Schema
>>> s = Schema({"foo": uuid.UUID})
>>> s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
voluptuous.MultipleInvalid: expected UUID for dictionary value @ data['foo']
>>> s({"foo": uuid.uuid4()})
{'foo': UUID('60b6d6c4-e719-47a7-8e2e-b4a4a30631ed')}

And that's not what is wanted here.

That mechanism is really neat to transform, for example, strings to timestamps.

>>> import datetime
>>> from voluptuous import Schema
>>> def Timestamp(value):
... return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
>>> s = Schema({"foo": Timestamp})
>>> s({"foo": '2015-03-03T12:12:12'})
{'foo': datetime.datetime(2015, 3, 3, 12, 12, 12)}
>>> s({"foo": '2015-03-03T12:12'})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']

Recursive schemas

So far, Voluptuous has one limitation so far: the ability to have recursive schemas. The simplest way to circumvent it is by using another function as an indirection.

>>> from voluptuous import Schema, Any
>>> def _MySchema(value):
... return MySchema(value)
>>> from voluptuous import Any
>>> MySchema = Schema({"foo": Any("bar", _MySchema)})
>>> MySchema({"foo": {"foo": "bar"}})
{'foo': {'foo': 'bar'}}
>>> MySchema({"foo": {"foo": "baz"}})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']['foo']

Usage in REST API

I started to use Voluptuous to validate data in a the REST API provided by Gnocchi. So far it has been a really good tool, and we've been able to create a complete REST API that is very easy to validate on the server side. I would definitely recommend it for that. It blends with any Web framework easily.

One of the upside compared to solution like JSON Schema, is the ability to create or re-use your own custom data types while converting values at validation time. It is also very Pythonic, and extensible – it's pretty great to use for all of that. It's also not tied to any serialization format.

On the other hand, JSON Schema is language agnostic and is serializable itself as JSON. That makes it easy to be exported and provided to a consumer so it can understand the API and validate the data potentially on its side.

by Julien Danjou at September 04, 2015 12:55 PM


Minkowski decomposition of lattice point cloud

Given two point clouds $A,B\subset\mathbb Z^d$, let $A\oplus B$ be their Minkowski sum, defined as the set $\{ a + b : a\in A, b\in B \}$.

Is there any known result for the following problem?

Given a point cloud $S\subset\mathbb Z^d$, are there two point clouds $A$ and $B$, none of them reduced to a single point, such that $A\oplus B = S$?

I am in particular interested in the case $d=2$.

A related result is the following: Given a convex set of points¹ $S\subset\mathbb Z^2$, it is $\mathsf{NP}$-hard to decide whether it can be written as the sum of two convex sets $A$ and $B$ [1]. As far as I can tell, this does not imply a hardness result for the above problem since the equality is on the convex hulls only, not on the point clouds. In the terminology I used above, in this problem is only required that $A\oplus B\subset S$. The problem on point clouds may well be easier (not $\mathsf{NP}$-hard in particular) since it is much more constrained.

[1] Gao, S., & Lauder, A. G. (2001). Decomposition of polytopes and polynomials. Discrete & Computational Geometry, 26(1), 89-104.

¹ A subset $S\subset\mathbb Z^d$ is said convex if all points of $S$ lie on the boundary of the convex hull of $S$.

by Bruno at September 04, 2015 12:52 PM



What is this ratio: expected returns on stock divided by risk free rate?

So this ratio has come up in some work I'm doing and I can't seem to figure out if it is attested in the literature. Here's the setting:

Given a risk free rate $r(t)$ and a stock price which follows a geometric brownian motion, $\frac{dS(t)}{S(t)} = \mu_S(t)dt + \sigma_S(t)dW(t)$, what is the significance/definition/terminology for the ratio of expected growth in stock prices to the risk free rate, $\frac{\mu_S(t)}{r(t)}$? This is more of a question regarding vocabulary/interpretation, so any ideas would be greatly appreciated.

by pdevar at September 04, 2015 11:54 AM

Why the negative sign in modified duration relationship

If $P$ is price, $D$ modified duration and $y$ yield then we have the relationship,

$$dP=-D \cdot P \cdot dy$$

Why is there a minus sign and what does correspond to?

by sandstone at September 04, 2015 11:42 AM

Planet Emacsen

Irreal: Move Back to Last Misspelling

One of the handiest things I've learned from Pragmatic Emacs is using 【Ctrl+;】 to cycle through possible corrections for the last misspelled word. It's especially handy because the point doesn't have to be on or at the end of the word you want to correct. It can be after the word and other text as long as there are no intervening misspellings.

Ben Maughan is back with a related tip. Sometimes, flyspell can't find the proper spelling and you need to fix things up by hand. If you're still at the word, that's easy but if the point is after the word you need to jump back to the typo to make the correction. Maughan gives us a bit of Elisp to do just that. It's a small thing, of course, but as with all such micro-optimizations it makes our editing experience a bit more efficient and pleasant.

by jcs at September 04, 2015 11:29 AM





sheduling in parallel machines with machine delay and job constraints

Given m machines and n jobs ,we want to schedule jobs into machines ,such that lateness ie( idealCompletionTime- actualCompletionTime) is minimized .Also the machines are required to take a break after job J_i ,which is alpha-times the processing time of job J_i .

Keeping this in mind, how do I actually deduce the NP-completeness (I vaguely understand that maybe it can be done from bin-packing) ?

by ladyM at September 04, 2015 11:22 AM



Zip two Arrays, always 3 elements of the first array then 2 elements of the second

I've manually built a method that takes 2 arrays and combines them to 1 like this:


So I always take 3 elements of the first array, then 2 of the second one. As I said, I built that function manually.

Now I guess I could make this a one-liner instead with the help of zip. The problem is, that zip alone is not enough as zip builds tuples like (a0, b0).

Of course I can flatMap this, but still not what I want:

val zippedArray: List[Float] = => List(t._1, t._2))

That way I'd get a List(a0, b0, a1, b1,...), still not what I want. (I'd then use toArray for the list... it's more convenient to work with a List right now)

I thought about using take and drop but they return new data-structures instead of modifying the old one, so not really what I want.

As you can imagine, I'm not really into functional programming (yet). I do use it and I see huge benefits, but some things are so different to what I'm used to.

by Teolha at September 04, 2015 10:46 AM


Scala The Good Parts = Kotlin?

I've just skimmed over the Databricks Scala Guide (Spark) and it looks like the Spark developers consider much of the more advanced features of Scala to be more harmful than useful.


Infix Methods

// Correct string.contains("foo") // Wrong string contains "foo" 

apply Method

Avoid defining apply methods on classes.

Call by Name

Avoid using call by name. Use () => T explicitly.

Multiple Parameter Lists

Avoid using multiple parameter lists.

Symbolic Methods (Operator Overloading)

Do NOT use symbolic method names, unless you are defining them for natural arithmetic operations (e.g. +, -, *, /).

One of the most abused features in Scala!


Avoid using implicits, unless: you are building a domain-specific language you are using it for implicit type parameters (e.g. ClassTag, TypeTag) you are using it private to your own class to reduce verbosity of converting from one type to another (e.g. Scala closure to Java closure)

Traversal and zipWithIndex

Use while loops instead of for loops or functional transformations (e.g. map, foreach). For loops and functional transformations are very slow (due to virtual function calls and boxing).

It looks like they are describing a subset of Scala which is more or less equivalent to Kotlin. That should set us thinking.

submitted by cryptos6
[link] [32 comments]

September 04, 2015 10:42 AM


Passing and executing functions with different parameter signatures

I am trying to construct some objects in a reasonably generic way. Some of the objects have constructor params, others don't.

What I am trying to achieve is to return some kind of builder function to which I can supply the constructor param if required.

I know I could have optional params passed down, but in my real scenario, there are several layers and I'm loathed to add optional params down the hierarchy.

I'm not too up on partial application/currying, but could I use that here and if so, how?

Here's a bit of sample code - which won't work - to try and explain a bit more what I'm after.

public void Main()
    dynamic buildClass = ClassBuilder<BaseClass>(true);
    // ideally I'd like to be able to supply the constructor data 
    // here
    var theClass = buildClass(???)


public Func<???, TClass> ClassBuilder<TClass>(bool flag) where TClass : BaseClass
    // obviously this won't work since the delegates have different
    // signatures
    if (flag) return () => GetClassA();
    return (x) => GetClassB(x);

public object GetClassA()
    return new ClassA();

public object GetClassB(string param)
    return new ClassB(param);

public class BaseClass {}

public class ClassA : BaseClass {}

public class ClassB : BaseClass
    private string _param;
    public ClassB(string param)
        _param = param;

Many thx


by Simon Woods at September 04, 2015 10:33 AM


[Hack request] Org-mode heading and blank lines

I've never been happy with the way blank lines er inserted with new heading in org-mode.

In short, I would like blank lines inserted only after headings that has text below them.

That is, when M-RET'ing at the end of this heading:

* Heading 

I would like to insert a new heading without a blank line, like this:

* Heading * 

But when inserting a new heading after heading that has text below it, like this one:

** Heading Some text 

I would like a blank line inserted, like this:

** Heading Some text * 

The 'auto' option:

(setq org-blank-before-new-entry '((heading . t) (plain-list-item . nil))) 

... doesn't achieve this.

I have found another guy asking how to achieve the behavior described above but I haven't found a way to do it, so I'm wondering if any of you guys have a hack for this?

submitted by tty-tourist
[link] [1 comment]

September 04, 2015 10:28 AM

Fred Wilson

Feature Friday: Twitter Highlights

I found out about this feature here at AVC a few weeks ago so it is fitting that I’m telling you all how much I like it. I believe it is only available on Android right now. I could be wrong about that, but I think that is the case.

This morning I had this notification on my phone:


When I clicked on that notification, I got this:


As I swiped through these cards, I got these in sequence:







These highlights are personalized for me based on who I’m following and likely a bunch of other things. So not all of them will matter to you. But I can tell you that this is a damn good personalized set of highlights for me.

If you want this on your Android, go to your mobile notification settings:


And select the Highlights checkbox:


When you follow almost 1,000 people on Twitter, you need something like this to make sure you don’t miss the good stuff. I really like it.

by Fred Wilson at September 04, 2015 10:22 AM


Passing a Closure as an Argument to another Function won't work

I have a function evaluate that takes arguments. The first argument is an Int. The second argument of a closure that takes an Int and returns a Double. The function evaluate then returns a [Double]. The k’th element of the returned array is the result of evaluating the second argument with the value k for k = 0, 1, ..., n.

func evaluate(n: Int, myFunction: Int -> Double) -> [Double] {
    var doubles = [Double]()
    for i in 1...n {
        doubles[i] = myFunction(i)

    return doubles

let polyTableClosure: Int -> Double = { return Double($0 * $0 * $0 + 2 * $0 + 4) }

print(evaluate(5, polyTableClosure))

Expecting something like: [7, 16, 37, 76, 139]

by Kranthi Kumar Gurrala at September 04, 2015 09:24 AM



How to decide what is the decision variant of some search/optimisation problem?

Consider the TSP: find a minimal hamiltonian circuit given some graph G.

Now the accepted decision variant seems to be: Does some circuit exist such that it is at least weight W? And we verify this by providing the verifier with certificate C and simply adding up the weight of the edges in C.

Now my question is, how did "they" decide that this is the decision variant of the problem and that it isn't: Given graph G, is certificate C the minimal circuit within G? This to me seems much more intuitive.

The accepted decision variant of TSP is just trivial and not very useful at all (since it is simply adding up edges).

Ofcourse, if the latter decision problem was the accepted self-reduction, then there wouldn't really be any obvious/known way to verify it in polytime, so then the problem would be in EXPTIME.

But if so, then why don't we just come up with trivial decision variants for actual EXPTIME-complete algorithms and just call it a day by saying that they are NP-complete? Or is one of the properties of EXPTIME problems that a trivial polytime verifier simply does not exist at all, no matter how much you "dumb it down"?

Sorry for the convoluted question, I am just really confused right now!

submitted by HoboAtUofT
[link] [4 comments]

September 04, 2015 09:06 AM

Planet Emacsen

Pragmatic Emacs: Edit files inside zip or tar archives

This is as easy as opening a .zip file in emacs. You will be shown a dired style file browser (in Zip-Archive mode) in which you can open files and edit and save them as normal, without having to extract them from the archive. You can also do a few other things to the files in the archive – use C-h m to see the help for the current major mode.

Similarly, opening a .tar (or even a compressed .tar.gz) archive will open the contents in Tar mode. This is very similar to the above, except that once you edit a file, you have to go back to the Tar mode buffer and save that to update the original archive file (this is automatic in Zip-Archive mode).

by Ben Maughan at September 04, 2015 09:00 AM


Modeling credit utilization and stock market growth

I relatively new to financial mathematics but I am wondering if at all there exists a relationship between credit utilization (the rate at which the public accesses credit from financial institutions) and stock market growth (credit utilization influence on the stock market) of an economy.

Please I need advice on these questions: If there is a relationship, how can I best model this situation. If no, what would be the reasons for this?

Thank you very much

by KaRJ XEN at September 04, 2015 08:40 AM


Question about set-associative cache mapping

I found a question:

Let a two-way set-associative cache of 4 memory blocks, each block containing one word.

What is the number of misses and hits considering the following sequence of block addresses: 0, 8, 0, 6, 8 ?

a) 4 misses and 1 hit
b) 3 misses and 2 hits
c) 1 miss and 4 hits
d) 2 misses and 3 hits

Although I understand how set-associative cache mapping works, I don't understand how can you tell if it will be a hit or a miss if you don't know the tags and what is stored on the cache memory.

Can you please point me in the right direction? How should I think about this problem?

by Cristi at September 04, 2015 08:26 AM



Is this permutation-sum problem NP-complete? [duplicate]

This question is an exact duplicate of:

A new, tighter tardiness bound has been found for global Earliest-Deadline-First scheduling of jobs on symmetric multiprocessors. But this bound seems to be particularly hard to compute. In particular, we were not able either to find a polynomial-time algorithm to compute this bound, or to prove that the problem is NP-hard. In this respect, a sufficient condition for the problem to be NP-hard, is that the following sub-problem is NP-complete.


  • a set $\tau$ of $N$ pairs $(C_i, U_i) \in \mathbb{Q}^2$, with $C_i > 0$ and $0 < U_i \le 1$ for all $i \in \{1, 2, \ldots, N\}$,
  • the smallest natural number $M$ such that $\sum_{i=1}^{N} U_i \le M$ (which implies $M \le N$, because $U_i \le 1$),
  • an integer $M'$ with $1 \le M' \le M$,
  • a strictly positive value $b \in \mathbb{Q}$;

does a permutation $\tau' = ( (C_{i_1},U_{i_1}), (C_{i_2},U_{i_2}), \ldots, (C_{i_{M'-1}}, U_{i_{M'-1}}))$ of $M' - 1$ pairs in $\tau$ exist, such that $$ \sum_{g = 1}^{M'-1}\frac{C_{i_g}}{M - \sum_{v=1}^{g-1}U_{i_v}} \ge b \quad ? $$

Any suggestion for proving that this is problem is NP-hard, or, on the other hand, for finding a way to solve the problem in polynomial time?

by Paolo Valente at September 04, 2015 07:36 AM


Shortcut not working only in C++-mode

I have a method to duplicate to region and it is set to C-d but I couldn't make it work in C/C++ mode. I have used it in cperl-mode, ruby-mode and lisp-mode it is working. However, in C++-mode it only deletes a character

Here is the code if it helps

(defun duplicate-current-line-or-region (arg) (interactive "p") (let (beg end (origin (point))) (if (and mark-active (> (point) (mark))) (exchange-point-and-mark)) (setq beg (line-beginning-position)) (if mark-active (exchange-point-and-mark)) (setq end (line-end-position)) (let ((region (buffer-substring-no-properties beg end))) (dotimes (i arg) (goto-char end) (newline) (insert region) (setq end (point))) (goto-char (+ origin (* (length region) arg) arg))))) (global-set-key (kbd "C-d") 'duplicate-current-line-or-region) 

and no, I am not the writer of this method, I guess I've found it in SO.

submitted by A_Seat_For_One
[link] [5 comments]

September 04, 2015 07:14 AM


How does a computer determine the data type of a byte?

For example, if the computer has 10111100 stored on one particular byte of RAM, how does the computer know to interpret this byte as an integer, ASCII character, or something else? Is type data stored in an adjacent byte? (I don't think this would be the case as this would result in using twice the amount of space for one byte.)

I suspect that perhaps a computer does not even know the type of data, that only the program using it knows. My guess is that because RAM is RAM and therefore not read sequentially, that a particular program just tells the CPU to fetch the info from a specific address and the program defines how to treat it. This would seem to fit with programming things such as the need for typecasting.

Am I on the right track?

by HCBPshenanigans at September 04, 2015 07:12 AM


Making my Clojure map function implementation faster

I've been experimenting with Clojure lately. I tried writing my own map function (two actually) and timed them against the built in function. However, my map functions are way way slower than the built in one. I wanted to know how I could make my implementation faster. It should give me some insights into performance tuning Clojure algorithms I write. The first function (my-map) does recursion with recur. The second version (my-map-loop) uses loop/recur which was much faster than simply using recur.

(defn my-map
    ([func lst] (my-map func lst []))
    ([func lst acc]
        (if (empty? lst)
            (recur func (rest lst) (conj acc (func (first lst)))))))

(defn my-map-loop
    ([func lst]
        (loop [acc []
                     inner-lst lst]
            (if (empty? inner-lst)
                (recur (conj acc (func (first inner-lst))) (rest inner-lst))

(let [rng (range 1 10000)]
    (time (map #(* % %) rng))
    (time (my-map #(* % %) rng))
    (time (my-map-loop #(* % %) rng)))

These are the results I got -

"Elapsed time: 0.084496 msecs"
"Elapsed time: 14.132217 msecs"
"Elapsed time: 7.324682 mess"


After resueman pointed out that I was timing things incorrectly, I changed the functions to:

(let [rng (range 1 10000)]
  (time (doall (map #(* % %) rng)))
  (time (doall (my-map #(* % %) rng)))
  (time (doall (my-map-loop #(* % %) rng)))

These are the new results:

"Elapsed time: 9.563343 msecs"
"Elapsed time: 12.320779 msecs"
"Elapsed time: 5.608647 mess"

"Elapsed time: 11.103316 msecs"
"Elapsed time: 18.307635 msecs"
"Elapsed time: 5.86644 mess"

"Elapsed time: 10.276658 msecs"
"Elapsed time: 10.288517 msecs"
"Elapsed time: 6.19183 mess"

"Elapsed time: 9.277224 msecs"
"Elapsed time: 13.070076 msecs"
"Elapsed time: 6.830464 mess"

Looks like my second implementation is fastest of the bunch. Anyways, I would still like to know if there are ways to optimize it further.

by lightning at September 04, 2015 07:03 AM


Coming up: Massachusetts and Austin!

this is the logo for the event known colloquially as 'topatocon'

I’ve got just TWO MORE APPEARANCES this year! I hope you will come to BOTH of them, somehow.


Not ONLY is this going to be a fun time with pals and hangouts, but I’m ALSO going to be doing some presentations!

SATURDAY, 1:30 PM: True Stuff From Old Books with David Malki !

David Malki ! of Wondermark presents a slide show of fascinating, forgotten articles unearthed from Victorian-era newspapers and magazines. A man breathes fire! A steam-powered flying machine attempts its first flight! Inventors and adventurers dream big, and often die! And human nature remains unchanged through the ages.

SUNDAY, 2:00 PM: Go Away Sea Lions w/David Malki !, Spike Trotman, Kate Leth, R.K. Milholland

The internet! It connects everybody with everybody, at all times. Does this mean that every thought that flits through everyone’s head needs to be shouted constantly at strangers? POSSIBLY, but how is anyone supposed to know for sure? In this roundtable discussion, our panel of opinionated blowhards slash veterans of the troll wars will revisit another 90’s classic sorely in need of a grim, modern reboot: the concept of “netiquette”.

All this and Machine of Death game demos in the tabletop gaming section! There’s also workshops and stuff, but you have to sign up for those in advance to reserve your spot.  Tickets are on sale now, get yours posthaste!


I’m pleased to return to Dragon’s Lair in Austin this fall for this fun li’l event held at a great comics and games store! It’s a free event, just stop on by the store!

I am going to try NOT bringing a ton of books and stuff, and instead, doing a LOT more Roll-a-Sketches. I am curious to see how it goes! Come be part of the EXPERIMENT.

ANOTHER part of the experiment is “mentioning these events farther in advance than the week of”. I’ve tried doing it the old way, now let’s try doing it this way and see how it goes.

by David Malki at September 04, 2015 06:50 AM


What is the best book for a 15 year old to start algorithms, and wants to go to the IOI 2017? [on hold]

I am currently 15 and I want to go to the ioi, in order to go to the ioi I need to learn about algorithms and data structures. I tried the CLRS book but sadly the maths in it was too hard. I am currently an average maths student at school but I am ok with programming and I am still currently learning how to code python. Can somebody recommend and books for some my age who wants to learn about algorithms and data structures but the book is too strong on math? I am willing to put all my time into practice for ioi the next two years because I would really like to compete in the ioi 2017 but all I need is some really good advice, resources and help from the people of stack overflow.

by jame_smith at September 04, 2015 06:33 AM




Difference between the two regular expression

I came across a question to create a regular expression to check whether there are even number of b's in a string of language {a,b}. The expression I came up with is (a+ba*b)*. However I found the solution online to be a*(ba*ba*). Is there any difference? If any where?

Also I know that (a+b)* is equivalent to (a*b*)*. So my solution comes out to be (a*(ba*b))*. So why that extra a*?

by Adwait Kumar at September 04, 2015 04:27 AM


What's the best and most modern free online course for machine learning?

I'm taking a machine learning course this fall. I'd like to watch a whole series of lectures in advance because I'm cool like that. Suggestions? Or is 2008 Andrew Ng the winner?

submitted by Zulban
[link] [comment]

September 04, 2015 04:23 AM


Is it possible to boost the error probability of a Consensus protocol over dynamic network?

Consider the binary consensus problem in a synchronous setting over dynamic network (thus, there are $n$ nodes, and some of them are connected by edges that may change round to round). Given a randomized $\delta$-error protocol for Consensus in this setting, is it possible to use this protocol as a black box to obtain another Consensus protocol with smaller error?

Note that for problems with deterministic outcome (such as Sum), this can be done by repeatedly executing the protocol and taking the majority as the answer. However I am unable to figure out how for Consensus, so I wonder whether this is actually impossible to do.

--- Details ---

Dynamic Network. There are $n$ nodes on a dynamic network over synchronous setting (thus the protocol will proceed in rounds). Each node has a unique ID of size $\log(n)$. There are edges connecting pairs of nodes, and this set of edges may change from round to round (but remains fixed throughout any particular round). There may be additional restrictions to the dynamic network that allows the problem to be solved by some protocol (such as having a fixed diameter), but it is irrelevant to our discussion.

The only information known to each node is their unique ID. They do not know $n$ or the topology of the network. In particular, they do not know who their neighbors are (nodes that are adjacent to them with an edge).

How nodes communicate. The only method of communication between nodes is by broadcasting messages under the $CONGEST$ model (D. Peleg, "Distributed Computing, A Locality-Sensitive Approach"). More specifically, in each round, every node may broadcast an $O(\log n)$ sized message. Then, every node will receive ALL messages that were broadcasted by all of its neighbors in that particular round in an arbitrary order. Thus, in particular, nodes may attach their IDs onto their messages, since their IDs are of size $O(\log n)$.

Binary Consensus. The binary consensus problem over this network is as follows. Each node has a binary input ($0$ or $1$), and each must decide either $0$ or $1$ under the the validity, termination, and agreement for consensus with $\delta$ error:

  • validity means that for $z \in \{0, 1\}$, if all nodes have $z$ as its initial input, then the only possible decision is $z$.

  • termination means that each node must eventually decide.

  • agreement means that all nodes must have the same decision.

  • The error rate means that under worst case input and worst case network, the probability that the protocol does not satisfy any of the three conditions is at most $\delta$ over average coin flips of the protocol's coin.

Question. For a particular set of dynamic networks, suppose we have a lower bound on the average number of rounds incurred by any randomized $\delta$-error protocol $P$ for solving binary consensus under this settings such that $\delta < \frac{1}{3}$, under worst case dynamic network from this set, worst case nodes' inputs, and on average coin flips of the protocol. Does this bound automatically hold for any $\delta'$-error protocol for $\delta < \delta' < \frac{1}{3}$?.

by Irvan at September 04, 2015 04:18 AM


Planet Theory


worst case time complexity [on hold]

For binary search i am able to calculate the average case complexity but i don't know how to find worst case complexity?Please help

by user153017 at September 04, 2015 02:33 AM

arXiv Networking and Internet Architecture

Performance Comparisons of Geographic Routing Protocols in Mobile Ad Hoc Networks. (arXiv:1509.01205v1 [cs.IT])

Geographic routing protocols greatly reduce the requirements of topology storage and provide flexibility in the accommodation of the dynamic behavior of mobile ad hoc networks. This paper presents performance evaluations and comparisons of two geographic routing protocols and the popular AODV protocol. The tradeoffs among the average path reliabilities, average conditional delays, average conditional numbers of hops, and area spectral efficiencies and the effects of various parameters are illustrated for finite ad hoc networks with randomly placed mobiles. This paper uses a dual method of closed-form analysis and simple simulation that is applicable to most routing protocols and provides a much more realistic performance evaluation than has previously been possible. Some features included in the new analysis are shadowing, exclusion and guard zones, distance-dependent fading, and interference correlation.

by <a href="">Don Torrieri</a>, <a href="">Salvatore Talarico</a>, <a href="">Matthew C. Valenti</a> at September 04, 2015 01:30 AM

Unmanned Aerial Vehicle with Underlaid Device-to-Device Communications: Performance and Tradeoffs. (arXiv:1509.01187v1 [cs.IT])

In this paper, the deployment of an unmanned aerial vehicle (UAV) as a flying base station used to provide on the fly wireless communications to a given geographical area is analyzed. In particular, the co-existence between the UAV, that is transmitting data in the downlink, and an underlaid device-todevice (D2D) communication network is considered. For this model, a tractable analytical framework for the coverage and rate analysis is derived. Two scenarios are considered: a static UAV and a mobile UAV. In the first scenario, the average coverage probability and the average sum-rate for the users in the area are derived as a function of the UAV altitude and the number of D2D users. In the second scenario, using the disk covering problem, the minimum number of stop points that the UAV needs to visit in order to completely cover the area is computed. Simulation and analytical results show that, depending on the density of D2D users, optimal values for the UAV altitude exist for which the average sum-rate and the coverage probability are maximized. Moreover, our results also show that, by enabling the UAV to intelligently move over the target area, the overall communication rate and coverage probability can be significantly improved. Finally, in order to provide a full coverage for the area of interest, the tradeoff between the coverage and delay, in terms of the number of stop points, is discussed.

by <a href="">Mohammad Mozaffari</a>, <a href="">Walid Saad</a>, <a href="">Mehdi Bennis</a>, <a href="">Merouane Debbah</a> at September 04, 2015 01:30 AM

Parallel Knowledge Embedding with MapReduce on a Multi-core Processor. (arXiv:1509.01183v1 [cs.DC])

This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with {\it MapReduce} programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. {\it translating embedding} ({\bf TransE}) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the {\it Map} phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different {\it Map} workers, and further leads to conflicts when conducting {\it Reduce} to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of {\it entity inference}, {\it relation prediction}, and even {\it triplet classification} evaluated by the single-thread {\bf TransE} on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread {\bf TransE} performs by the {\it stochastic gradient descend} (SGD) algorithm, as well as increase the training speed multiple times via adapting the {\it batch gradient descend} (BGD) algorithm for {\it MapReduce} paradigm.

by <a href="">Miao Fan</a>, <a href="">Qiang Zhou</a>, <a href="">Thomas Fang Zheng</a>, <a href="">Ralph Grishman</a> at September 04, 2015 01:30 AM

Model Predictive Path Integral Control using Covariance Variable Importance Sampling. (arXiv:1509.01149v1 [cs.SY])

In this paper we present a Model Predictive Path Integral (MPPI) control algorithm that is derived from the path integral control framework and a generalized importance sampling scheme. In order to operate in real time we parallelize the sampling based component of the algorithm and achieve massive speed-up by using a Graphical Processor Unit (GPU). We compare MPPI against traditional model predictive control methods based on Differential Dynamic Programming (MPC-DDP) on a benchmark cart-pole swing-up task as well as a navigation tasks for a racing car and a quad-rotor in simulation. Finally, we use MPPI to navigate a team of three (48 states) and nine quad rotors (144 states) through cluttered environments of fixed and moving obstacles in simulation.

by <a href="">Grady Williams</a>, <a href="">Andrew Aldrich</a>, <a href="">Evangelos Theodorou</a> at September 04, 2015 01:30 AM

Cointegrating Jumps: an Application to Energy Facilities. (arXiv:1509.01144v1 [q-fin.PR])

Based on the concept of self-decomposable random variables we discuss the application of a model for a pair of dependent Poisson processes to energy facilities. Due to the resulting structure of the jump events we can see the self-decomposability as a form of cointegration among jumps. In the context of energy facilities, the application of our approach to model power or gas dynamics and to evaluate transportation assets seen as spread options is straightforward. We study the applicability of our methodology first assuming a Merton market model with two underlying assets; in a second step we consider price dynamics driven by an exponential mean-reverting Geometric Ornstein-Uhlenbeck plus compound Poisson that are commonly used in the energy field. In this specific case we propose a price spot dynamics for each underlying that has the advantage of being treatable to find non-arbitrage conditions. In particular we can find close-form formulas for vanilla options so that the price and the Greeks of spread options can be calculated in close form using the Margrabe formula (if the strike is zero) or some other well known approximation.

by <a href="">Nicola Cufaro Petroni</a>, <a href="">Piergiacomo Sabino</a> at September 04, 2015 01:30 AM

An Ant Colonization Routing Algorithm to Minimize Network Power Consumption. (arXiv:1509.01094v1 [cs.NI])

Rising energy consumption of IT infrastructure concerns have spurred the development of more power efficient networking equipment and algorithms. When \emph{old} equipment just drew an almost constant amount of power regardless of the traffic load, there were some efforts to minimize the total energy usage by modifying routing decisions to aggregate traffic in a minimal set of links, creating the opportunity to power off some unused equipment during low traffic periods. New equipment, with power profile functions depending on the offered load, presents new challenges for optimal routing. The goal now is not just to power some links down, but to aggregate and/or spread the traffic so that devices operate in their sweet spot in regards to network usage. In this paper we present an algorithm that, making use of the ant colonization algorithm, computes, in a decentralized manner, the routing tables so as to minimize global energy consumption. Moreover, the resulting algorithm is also able to track changes in the offered load and react to them in real time.

by <a href="">Miguel Rodr&#xed;guez P&#xe9;rez</a>, <a href="">Sergio Herrer&#xed;a Alonso</a>, <a href="">Manuel Fern&#xe1;ndez Veiga</a>, <a href="">C&#xe1;ndido L&#xf3;pez Garc&#xed;a</a> at September 04, 2015 01:30 AM

Group key management based on semigroup actions. (arXiv:1509.01075v1 [cs.IT])

In this work we provide a suite of protocols for group key management based on general semigroup actions. Construction of the key is made in a distributed and collaborative way. We provide security proofs against passive attacks and suitable examples that may enhance both the security level and communication overheads of previous existing protocols.

by <a href="">Juan Antonio Lopez-Ramos</a>, <a href="">Joachim Rosenthal</a>, <a href="">Davide Schipani</a>, <a href="">Reto Schnyder</a> at September 04, 2015 01:30 AM

Building a Truly Distributed Constraint Solver with JADE. (arXiv:1509.01040v1 [cs.AI])

Real life problems such as scheduling meeting between people at different locations can be modelled as distributed Constraint Satisfaction Problems (CSPs). Suitable and satisfactory solutions can then be found using constraint satisfaction algorithms which can be exhaustive (backtracking) or otherwise (local search). However, most research in this area tested their algorithms by simulation on a single PC with a single program entry point. The main contribution of our work is the design and implementation of a truly distributed constraint solver based on a local search algorithm using Java Agent DEvelopment framework (JADE) to enable communication between agents on different machines. Particularly, we discuss design and implementation issues related to truly distributed constraint solver which might not be critical when simulated on a single machine. Evaluation results indicate that our truly distributed constraint solver works well within the observed limitations when tested with various distributed CSPs. Our application can also incorporate any constraint solving algorithm with little modifications.

by <a href="">Ibrahim Adeyanju</a> at September 04, 2015 01:30 AM

Multi-Source Cooperative Communication with Opportunistic Interference Cancelling Relays. (arXiv:1509.01038v1 [cs.NI])

In this paper we present a multi-user cooperative protocol for wireless networks. Two sources transmit simultaneously their information blocks and relays employ opportunistically successive interference cancellation (SIC) in an effort to decode them. An adaptive decode/amplify-and-forward scheme is applied at the relays to the decoded blocks or their sufficient statistic if decoding fails. The main feature of the protocol is that SIC is exploited in a network since more opportunities arise for each block to be decoded as the number of used relays NRU is increased. This feature leads to benefits in terms of diversity and multiplexing gains that are proven with the help of an analytical outage model and a diversity-multiplexing tradeoff (DMT) analysis. The performance improvements are achieved without any network synchronization and coordination. In the final part of this work the closed-form outage probability model is used by a novel approach for offline pre-selection of the NRU relays, that have the best SIC performance, from a larger number of NR nodes. The analytical results are corroborated with extensive simulations, while the protocol is compared with orthogonal and multi-user protocols reported in the literature.

by <a href="">Antonios Argyriou</a> at September 04, 2015 01:30 AM

A Strong Distillery. (arXiv:1509.00996v1 [cs.PL])

Abstract machines for the strong evaluation of lambda-terms (that is, under abstractions) are a mostly neglected topic, despite their use in the implementation of proof assistants and higher-order logic programming languages. This paper introduces a machine for the simplest form of strong evaluation, leftmost-outermost (call-by-name) evaluation to normal form, proving it correct, complete, and bounding its overhead. Such a machine, deemed Strong Milner Abstract Machine, is a variant of the KAM computing normal forms and using just one global environment. Its properties are studied via a special form of decoding, called a distillation, into the Linear Substitution Calculus, neatly reformulating the machine as a standard micro-step strategy for explicit substitutions, namely linear leftmost-outermost reduction, i.e., the extension to normal form of linear head reduction. Additionally, the overhead of the machine is shown to be linear both in the number of steps and in the size of the initial term, validating its design. The study highlights two distinguished features of strong machines, namely backtracking phases and their interactions with abstractions and environments.

by <a href="">Beniamino Accattoli</a>, <a href="">Pablo Barenbaum</a>, <a href="">Damiano Mazza</a> at September 04, 2015 01:30 AM

Ordered Tomlinson-Harashima Precoding in Downstream. (arXiv:1509.00993v1 [cs.IT]) is an upcoming next generation DSL standard envisioned to use bandwidth up to 212 MHz. Far-end crosstalk (FEXT) at these frequencies greatly overcomes direct links. Its cancellation based on non-linear Tomlinson-Harashima Precoding (THP) proved to show significant advantage over standard linear precoding. This paper proposes a novel THP structure in which ordering of successive interference pre-cancellation can be optimized for downstream with non-cooperating receivers. The optimized scheme is compared to existing THP structure denoted as equal-rate THP which is widely adopted in wireless downlink. Structure and performance of both methods differ significantly favoring the proposed scheme. The ordering that maximizes the minimum rate (max-min fairness) for each tone of the discrete multi-tone modulation is the familiar V-BLAST ordering. However, V-BLAST does not lead to the global maximum when applied independently on each tone. The proposed novel Dynamic Ordering (DO) strategy takes into account asymmetric channel statistics to yield the highest minimum aggregated rate.

by <a href="">Miroslav Hekrdla</a>, <a href="">Andrea Matera</a>, <a href="">Weiyang Wang</a>, <a href="">Dong Wei</a>, <a href="">Umberto Spagnolini</a> at September 04, 2015 01:30 AM

Security Analysis of Secure Force Algorithm for Wireless Sensor Networks. (arXiv:1509.00981v1 [cs.CR])

In Wireless Sensor Networks, the sensor nodes are battery powered small devices designed for long battery life. These devices also lack in terms of processing capability and memory. In order to provide high confidentiality to these resource constrained network nodes, a suitable security algorithm is needed to be deployed that can establish a balance between security level and processing overhead. The objective of this research work is to perform a security analysis and performance evaluation of recently proposed Secure Force algorithm. This paper shows the comparison of Secure Force 64, 128, and 192 bit architecture on the basis of avalanche effect (key sensitivity), entropy change analysis, image histogram, and computational time. Moreover, based on the evaluation results, the paper also suggests the possible solutions for the weaknesses of the SF algorithm.

by <a href="">Shujaat Khan</a>, <a href="">Muhammad Sohail Ibrahim</a>, <a href="">Mansoor Ebrahim</a>, <a href="">Haseeb Amjad</a> at September 04, 2015 01:30 AM

Sequential Design for Ranking Response Surfaces. (arXiv:1509.00980v1 [stat.ML])

We propose and analyze sequential design methods for the problem of ranking several response surfaces. Namely, given $L \ge 2$ response surfaces over a continuous input space $\cal X$, the aim is to efficiently find the index of the minimal response across the entire $\cal X$. The response surfaces are not known and have to be noisily sampled one-at-a-time. This setting is motivated by stochastic control applications and requires joint experimental design both in space and response-index dimensions. To generate sequential design heuristics we investigate stepwise uncertainty reduction approaches, as well as sampling based on posterior classification complexity. We also make connections between our continuous-input formulation and the discrete framework of pure regret in multi-armed bandits. To model the response surfaces we utilize kriging surrogates. Several numerical examples using both synthetic data and an epidemics control problem are provided to illustrate our approach and the efficacy of respective adaptive designs.

by <a href="">Ruimeng Hu</a>, <a href="">Mike Ludkovski</a> at September 04, 2015 01:30 AM

A Typology of Authentication Systems. (arXiv:1509.00961v1 [cs.CR])

Authentication systems are designed to give the right person access to an organization's information system and to restrict it from the wrong person. Such systems are designed by IT professionals to protect an organization's assets (e.g., the organization's network, database, or other information). Too often, such systems are designed around technical specifications without regard for the end user. We argue that doing so may actually compromise a system's security. This paper examines authentication systems from both the point of view of the organization and that of the user.

by <a href="">Christopher S. Pilson</a>, <a href="">James C. McElroy</a> at September 04, 2015 01:30 AM

Wireless Charging Technologies: Fundamentals, Standards, and Network Applications. (arXiv:1509.00940v1 [cs.NI])

Wireless charging is a technology of transmitting power through an air gap to electrical devices for the purpose of energy replenishment. The recent progress in wireless charging techniques and development of commercial products have provided a promising alternative way to address the energy bottleneck of conventionally portable battery-powered devices. However, the incorporation of wireless charging into the existing wireless communication systems also brings along a series of challenging issues with regard to implementation, scheduling, and power management. In this article, we present a comprehensive overview of wireless charging techniques, the developments in technical standards, and their recent advances in network applications. In particular, with regard to network applications, we review the mobile charger dispatch strategies, static charger scheduling strategies and wireless charger deployment strategies. Additionally, we discuss open issues and challenges in implementing wireless charging technologies. Finally, we envision some practical future network applications of wireless charging.

by <a href="">Xiao Lu</a>, <a href="">Ping Wang</a>, <a href="">Dusit Niyato</a>, <a href="">Dong In Kim</a>, <a href="">Zhu Han</a> at September 04, 2015 01:30 AM

Using Inclusion Diagrams as an Alternative to Venn Diagrams to Determine the Validity of Categorical Syllogisms. (arXiv:1509.00926v1 [cs.LO])

Inclusion diagrams are introduced as an alternative to using Venn diagrams to determine the validity of categorical syllogisms, and are used here for the analysis of diverse categorical syllogisms. As a preliminary example of a possible generalization of the use of inclusion diagrams, consideration is given also to an argument that includes more than two premises and more than three terms, the classic major, middle and minor terms in categorical syllogisms.

by <a href="">Osvaldo Skliar</a>, <a href="">Ricardo E. Monge</a>, <a href="">Sherry Gapper</a> at September 04, 2015 01:30 AM

Effective Spatial Data Partitioning for Scalable Query Processing. (arXiv:1509.00910v1 [cs.DB])

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and running query tasks on those partitions in parallel. Therefore, effective data partitioning is critical for task parallelization, load balancing, and directly affects system performance. However, several pitfalls of spatial data partitioning make this task particularly challenging. First, data skew is very common in spatial applications. To achieve best query performance, data skew need to be reduced. Second, spatial partitioning approaches generate boundary objects that cross multiple partitions, and add extra query processing overhead. Consequently, boundary objects need to be minimized. Third, the high computational complexity of spatial partitioning algorithms combined with massive amounts of data require an efficient approach for partitioning to achieve overall fast query response. In this paper, we provide a systematic evaluation of multiple spatial partitioning methods with a set of different partitioning strategies, and study their implications on the performance of MapReduce based spatial queries. We also study sampling based partitioning methods and their impact on queries, and propose several MapReduce based high performance spatial partitioning methods. The main objective of our work is to provide a comprehensive guidance for optimal spatial data partitioning to support scalable and fast spatial data processing in massively parallel data processing frameworks such as MapReduce. The algorithms developed in this work are open source and can be easily integrated into different high performance spatial data processing systems.

by <a href="">Ablimit Aji</a>, <a href="">Vo Hoang</a>, <a href="">Fusheng Wang</a> at September 04, 2015 01:30 AM

Locks and keys: How fast can you open several locks with too many keys?. (arXiv:1509.00844v1 [math.HO])

This short note is the result of a French "Hippocampe internship" that aims at introducing the world of research to young undergraduate French students. The problem studied is the following: imagine yourself locked in a cage barred with $n$ different locks. You are given a keyring with $N \geq n$ keys containing the $n$ keys that open the locks. In average, how many trials are required to open all locks and get out? The article studies $3$ different strategies and compare them. Implementation of the strategies are also proposed as illustrations of the theoretical results.

by <a href="">Olivier Marchal</a> at September 04, 2015 01:30 AM

Digital Fixed Points, Approximate Fixed Points, and Universal Functions. (arXiv:1507.02349v2 [math.CO] UPDATED)

A. Rosenfeld introduced the notion of a digitally continuous function between digital images, and showed that although digital images need not have fixed point properties analogous to those of the Euclidean spaces modeled by the images, there often are approximate fixed point properties of such images. In the current paper, we obtain additional results concerning fixed points and approximate fixed points of digitally continuous functions. Among these are several results concerning the relationship between universal functions and the approximate fixed point property (AFPP).

by <a href="">Laurence Boxer</a>, <a href="">Ozgur Ege</a>, <a href="">Ismet Karaca</a>, <a href="">Jonathan Lopez</a> at September 04, 2015 01:30 AM


Union grouping in bipartite graphs?

I'm trying to figure out a good (and fast) solution to the following problem:

I have two roles I'm working with, let's call them players and teams having many-to-many relationship (a player can be on multiple teams and a team can have multiple players). I'm working on creating a UI element on a form that allows a user to select multiple teams (checkboxes). As the user is selecting (or deselecting) teams, I'd like to display the teams grouped by the players.

So for examples:

  1. If the selected teams have no common players, each team will have its own section.

  2. If the user selects two teams and they have the same players, there would be one section containing the names of the two teams and all the players.

  3. If TEAM_A has players [1, 2, 4, 5] and TEAM_B has players [1, 3, 5, 6]. There would be the following sections: SECTION_X = [TEAM_A, TEAM_B, 1, 5], SECTION_Y = [TEAM_A, 2, 3], SECTION _Z = [TEAM_B, 3, 5]

I hope that's clear. Essentially, I want to find the teams that players have in common and group by that. I was thinking maybe there is a way to do this by navigating a bipartite graph? Not exactly sure how though and I might be overthinking it. I was hoping to do this by creating some type of data structure on the server and using it on the client. I would love to hear your suggestions and I appreciate any help you can give!

by Ian at September 04, 2015 01:18 AM


Forced to exercise gap options

I was reading a textbook and came across some surprising stuff in the section about gap options.

Let $X$ be a payoff function such that $X=\Big\{\matrix{0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , \ \ \ \ S(T)\leq K_2 \cr S(T)-K_1\ \ \ \ ,\ \ \ \ \ S(T)> K_2}$

The payoff would be negative if $K_2<S(T)< K_1$. The textbook says:

Notice that sometimes the option holder is forced to exercise at a loss! Perhaps the name "option" is a misnomer.

This seems surprising. Is it true that gap options can incur loss for the option holder beyond the initial price? Can you be "forced" to exercise options?

I looked online and didn't find anything. Also, feel free to edit/migrate if necessary.

by Joseph Malle at September 04, 2015 01:17 AM


CompSci Weekend SuperThread (September 04, 2015)

/r/compsci strives to be the best online community for computer scientists. We moderate posts to keep things on topic.

This Weekend SuperThread provides a discussion area for posts that might be off-topic normally. Anything Goes: post your questions, ideas, requests for help, musings, or whatever comes to mind as comments in this thread.


  • If you're looking to answer questions, sort by new comments.
  • If you're looking for answers, sort by top comment.
  • Upvote a question you've answered for visibility.
  • Downvoting is discouraged. Save it for discourteous content only.


  • It's not truly "Anything Goes". Please follow Reddiquette and use common sense.
  • Homework help questions are discouraged.
submitted by AutoModerator
[link] [comment]

September 04, 2015 01:03 AM

I'm taking my first CS class next semester, and I'm not sure if I'm prepared.

The class is "Algorithms and Computing w/C++ " Here's the course description:

“This course establishes fundamental computational techniques required for continuing study in computer science. Students design, implement and test C++ programs to solve a wide range of problems. Topics include program development, functions, control structures, text file operations, data types/classes, recursion, STL string/vector classes, arrays, and pointer variables. Object orientation and data abstraction/information hiding is emphasized.”

The only prerequisite is Calc 1. But does that sound like a class where some background knowledge will be necessary going into it? If so, what are some good resources to prepare? Any thoughts/input would be appreciated.

submitted by VyseTheFearless
[link] [7 comments]

September 04, 2015 12:50 AM

Planet Theory

Improved and Simplified Inapproximability for k-means

Authors: Euiwoong Lee, Melanie Schmidt, John Wright
Download: PDF
Abstract: The k-means problem consists of finding k centers in the d-dimensional Euclidean space that minimize the sum of the squared distances of all points in an input set P to their closest respective center. Awasthi et. al. recently showed that there exists a constant c > 1 such that it is NP-hard to approximate the k-means objective within a factor of c. We establish that the constant c is at least 1.0013.

September 04, 2015 12:42 AM

Strong Pseudoprimes to Twelve Prime Bases

Authors: Jonathan P. Sorenson, Jonathan Webster
Download: PDF
Abstract: Let $\psi_m$ be the smallest strong pseudoprime to the first $m$ prime bases. This value is known for $1 \leq m \leq 11$. We extend this by finding $\psi_{12}$ and $\psi_{13}$. We also present an algorithm to find all integers $n\le B$ that are strong pseudoprimes to the first $m$ prime bases; with a reasonable heuristic assumption we can show that it takes at most $B^{2/3+o(1)}$ time.

September 04, 2015 12:42 AM

Finding the Leftmost Critical Factorization on Unordered Alphabet

Authors: Dmitry Kosolobov
Download: PDF
Abstract: We present a linear time and space algorithm computing the leftmost critical factorization of a given string on an unordered alphabet.

September 04, 2015 12:42 AM

Advanced Multilevel Node Separator Algorithms

Authors: Peter Sanders, Christian Schulz
Download: PDF
Abstract: A node separator of a graph is a subset S of the nodes such that removing S and its incident edges divides the graph into two disconnected components of about equal size. In this work, we introduce novel algorithms to find small node separators in large graphs. With focus on solution quality, we introduce novel flow-based local search algorithms which are integrated in a multilevel framework. In addition, we transfer techniques successfully used in the graph partitioning field. This includes the usage of edge ratings tailored to our problem to guide the graph coarsening algorithm as well as highly localized local search and iterated multilevel cycles to improve solution quality even further. Experiments indicate that flow-based local search algorithms on its own in a multilevel framework are already highly competitive in terms of separator quality. Adding additional local search algorithms further improves solution quality. Our strongest configuration almost always outperforms competing systems while on average computing 10% and 62% smaller separators than Metis and Scotch, respectively.

September 04, 2015 12:41 AM

Testing Properties of Functions on Finite Groups

Authors: Kenta Oono, Yuichi Yoshida
Download: PDF
Abstract: We study testing properties of functions on finite groups. First we consider functions of the form $f:G \to \mathbb{C}$, where $G$ is a finite group. We show that conjugate invariance, homomorphism, and the property of being proportional to an irreducible character is testable with a constant number of queries to $f$, where a character is a crucial notion in representation theory. Our proof relies on representation theory and harmonic analysis on finite groups. Next we consider functions of the form $f: G \to M_d(\mathbb{C})$, where $d$ is a fixed constant and $M_d(\mathbb{C})$ is the family of $d$ by $d$ matrices with each element in $\mathbb{C}$. For a function $g:G \to M_d(\mathbb{C})$, we show that the unitary isomorphism to $g$ is testable with a constant number of queries to $f$, where we say that $f$ and $g$ are unitary isomorphic if there exists a unitary matrix $U$ such that $f(x) = Ug(x)U^{-1}$ for any $x \in G$.

September 04, 2015 12:40 AM



Ich hatte hier glaube ich schon mal auf diese Bilderreihe ...

Ich hatte hier glaube ich schon mal auf diese Bilderreihe aus Afghanistan aus den 50er und 60er Jahren verlinkt. Der Kontrast ist besonders deutlich bei den Frauen auf den Bildern.

Jetzt läuft mir gerade etwas ähnliches aus Ägypten über den Weg. Eine Rede von dem damaligen Präsidenten Nasser, wo er berichtet, wie ihn der Chef der Muslimbrüder zu überreden versucht hat, Frauen Schleier aufzuzwängen. Und als er das erzählt, hört man im Hintergrund das Publikum lachen, und einer ruft sogar: Soll er doch selber tragen, den Schleier!

Nur falls ihr mal wieder in die "die sind halt alle rückständig in der Gegend"-Falle zu tappen versucht seid.

September 04, 2015 12:01 AM

HN Daily

Planet Theory

On the Equivalence among Problems of Bounded Width

Authors: Yoichi Iwata, Yuichi Yoshida
Download: PDF
Abstract: In this paper, we introduce a methodology, called decomposition-based reductions, for showing the equivalence among various problems of bounded-width.

First, we show that the following are equivalent for any $\alpha > 0$:

* SAT can be solved in $O^*(2^{\alpha \mathrm{tw}})$ time,

* 3-SAT can be solved in $O^*(2^{\alpha \mathrm{tw}})$ time,

* Max 2-SAT can be solved in $O^*(2^{\alpha \mathrm{tw}})$ time,

* Independent Set can be solved in $O^*(2^{\alpha \mathrm{tw}})$ time, and

* Independent Set can be solved in $O^*(2^{\alpha \mathrm{cw}})$ time, where tw and cw are the tree-width and clique-width of the instance, respectively.

Then, we introduce a new parameterized complexity class EPNL, which includes Set Cover and Directed Hamiltonicity, and show that SAT, 3-SAT, Max 2-SAT, and Independent Set parameterized by path-width are EPNL-complete. This implies that if one of these EPNL-complete problems can be solved in $O^*(c^k)$ time, then any problem in EPNL can be solved in $O^*(c^k)$ time.

September 04, 2015 12:00 AM

September 03, 2015


how to use foldLeft for aggregation vs a recursive aggregation function?

I'm going through a book "Functional Programming with Scala". This function is perplexing. I see this:

def sum(as: IndexedSeq[Int]): Int =
 if(as.size <=1) as.headOption getOrElse 0
 val (l,r) = as.splitAt(as.length/2)
sum(l) + sum(r)

I test it

val l = IndexedSeq(1,2,3,4,5,6); 

Both foldLeft and sum emit 21. I understand how foldLeft works. But i dont see a place in the function where it is explicitly summed. Its split and the same sum func is applied. I tried to debug with println()s, but I see the vectors and then a sum. Where exactly is 1+2+3+4+5+6 being summed up?

by Don E at September 03, 2015 11:59 PM


How does a hard drive knows what bit is the beginning of of a byte/word?

I'm guessing I could replace the words "hard drive" with "random access medium" but let's be more specific here. Also for the sake of this question, let's not consider SSDs. Just plain old hard-drives with heads and magnetic discs.

I'd guess that the hard drive doesn't care about things as bytes and receives instructions from the OS on about where to move it's head, but that'd assume that the OS has knowledge of sectors, cylinders or whatever physical properties of the drive and it's stored data so that doesn't feel right.

So, how does this work?

by Loupax at September 03, 2015 11:44 PM


Kolmogorov-Smirnov test for Generalized Pareto Distribution

I've fitted my data to a generalized pareto distribution as to model the returns in the tails more accurately. The interior is fitted with kernel distributions.

I would like to now test whether the original returns conform to the hypothesized distribution (i.e. generalized pareto distribution). Can I do this with the Kolmogorov-Smirnov test? I've already QQ-plots. However, I would like to conduct a statistical significance test on top. Can some one help? Kind of struggling with implementing it in Matlab.


by Peter Miller at September 03, 2015 11:17 PM

Send TRAIL STOP order when price hits a certain level, with IB TWS

Posting here after searching around and not finding any responses to basically the same question that I saw on EliteTrader, with another variant posted 10 years ago (update: the same question on Money):

Say I bought X at 100 and want to have TWS automatically submit a TRAIL STOP order when the price reaches 110, with a trailing amount of 2 Assume I don't care how long it takes for the price to reach that level. How can I do that?


  1. X is now at 95. The order should wait until X hits 110. No order parameters should change.
  2. X hits 110. Trailing order is submitted, with trailing amount 2.
  3. X goes to 115. Great. Stop price is now 113.
  4. X drops to 112. Market order triggers.

I tried attaching a condition, but got this error:

Can't add conditions for trail orders

Conditional submission of orders is supported for Limit, Market, Relative and Snap order types only.

The goal is to lock in some profit and have unlimited upside, at the expense of downside risk. A Trailing Market if Touched sounds right for selling, but apparently isn't:

A sell trailing market if touched order moves with the market price, and continually recalculates the trigger price at a fixed amount above the market price, based on the user-defined "trailing" amount.

This goes against what I want for step 1 above.

Same for Trailing Limit if Touched:

As the market price falls, the trigger price falls by the user-defined trailing amount, but if the price rises, the trigger price remains the same. When the trigger price is touched, a market order is submitted.

I don't want the trigger price to fall. It should stay 110.

by Gascoyne at September 03, 2015 11:00 PM

Portfolio Hedging under Uncertain Correlations

I have a portfolio ($w_0=1$) and two hedging assets ($w_1,w_2$) and a co-variance matrix for the three $\Sigma$. However the co-variance $\Sigma$ is only an estimate.

  • For fairly well behaved assets (close to normal), what is a good way to understand the uncertainty in co-variance matrix using historical data?
  • Using the two assets, how do I lower my portfolio volatility $\sigma_0$ to a target amount $\sigma_T$ with the most certainty?

References would be greatly appreciated.

by rhaskett at September 03, 2015 10:45 PM


undecidable problem and its negation is undecidable

A lot of "famous" undecidable problems are nonetheless at least semidecidable, with their complement being undecidable. One example above all can be the halting problem and its complement.

However, can anybody give me an example in which both a problem and its complement are undecidable and not semidecidable? I thought about the diagonalization language Ld, but it does not seem to me that the complement is undecidable.

In that case, does that mean that a Turing Machine M can "lose" some strings that instead should be recognized, since they're part of the language we're trying to indentify?

by Giulia Frascaria at September 03, 2015 10:37 PM

Solving for Hit Ratio of a Theoretical Memory System

Long time lurker, first time poster. The book I am reading is William Stalling's "Operating Systems: Internals and Design Principles" Seventh Edition.

Stalling's definition of hit ratio according to the book is thus: "the hit ratio is defined as the fraction of all memory accesses that are found in the faster memory."

One of his questions is that given a memory system with a cache memory access time of 100 nanoseconds and a main memory access time of 1,200 nanoseconds; "If the effective access time is 10% greater than the cache access time, what is the hit ratio H?"

I have already found an answer online after attempting this question myself,

(0.10 x 100) = (1-H)1200

However, I was wondering if someone could explain two things:

  1. Where did this formula come from? (I have searched the chapter prefacing this question and, so far, Stallings has not derived or discussed any formulae.)

  2. Why is it 0.10 x 100? If the effective access time is 10% greater than the cache access time, would it not be 0.90 x 100? As 10% faster would be 10 nanoseconds less, not 10 nanoseconds total.

Pre-post edit: Looking through other threads I noticed that the miss ratio is 1 - hit ratio (which is logical). Staring at the answer I found for a minute afterwards brings me another question: Is the formula that was used to answer this stating that the effective access time of the fastest memory is equal to the effective access time of the slowest memory?

Edit: After digging around in StackExchange and Google, I found an excerpt from Silberschatz that deals similarly with this problem.

Given his formula:

effective access time = H*cache access time + (1-H)*main memory (in this case)

Using values from the above problem:

1.10*cache access time = H*cache access time + (1-H)*main memory access time

Substituting real numbers:

1.10*100 = H*100 + (1-H)*1200

Solving finds 1090/1100 or H to be approximately .9909, giving a hit ratio of approximately 99.1% Close to the "found" answer online, but I feel a lot better about this one.

by mitchem at September 03, 2015 10:32 PM


What preference lists in the stable matching problem would always be stable, no matter what the assignment is?

Suppose we had an instance of the stable marriage problem (also called stable matching). For what n is it possible to have a set of preferences where, no matter what the assignment is, results in no unstable matches?

Clearly, n=1 works because we have one woman and one man, so they must pick each other and cannot prefer someone else. For n=2, it is possible also with (W = woman, M = man): W1 has list M1, M2; W2 has list M2, M1; M1 has list W1, W2; M2 has list W2, W1. The only possible assignments are {(A1, B1), (A2, B2)} or {(A1, B2), (A2, B1)}, which are both stable.

But for larger n I am not sure if it is possible. I tried a few examples for n=3 and n=4, but each one has an unstable matching somewhere.

I think there is some structure though: one woman/man has the Wi/Mi in sorted ascending order of i, and another has them in sorted descending order of i. But for n=3, I'm not sure how to work with the "middle" man/woman.

submitted by coaster367
[link] [1 comment]

September 03, 2015 10:20 PM

Planet Emacsen

(or emacs: Fancy minibuffer faces for Ivy completion

Today, I'll describe a recent improvement to the display of Ivy completion. To use it, add this code to your configuration:

(setq ivy-display-style 'fancy)

After this, swiper will look like this:


And counsel-M-x will look like this:


If you haven't used it before, counsel-M-x is part the counsel package on MELPA, or part of swiper package if you're installing from GNU ELPA. Basically, it's a M-x replacement (that I like to bind to C-t for efficiency reasons), that doubles as find-function (just press C-. instead of RET to find the function instead of calling it). If you're using counsel-M-x I highly recommend to also install smex, since then counsel-M-x will use smex for sorting the matched commands.

The update will propertize the minibuffer contents with a new set of faces:

(defcustom swiper-minibuffer-faces
  "List of `swiper' faces for minibuffer group matches.")

Initially, when responding to #212, I used the original swiper faces in the minibuffer as well. But after some use, their brightness became a bit annoying. So I introduced a new set of faces that can be customized separately.

Here are the settings that I'm currently using:

   ((t :background "#dddddd")))
   ((t :background "#bbbbbb" :weight bold)))
   ((t :background "#bbbbff" :weight bold)))
   ((t :background "#ffbbff" :weight bold))))

It's similar to what the Firefox address bar uses: gray background and bold font. Additionally, the matching parts are highlighted with pale gray, blue and magenta colors. If you like the color scheme, it's part of eclipse-theme. Another choice, for even less distraction, could be to remove the background from these faces, only leaving the bold part on the matches.

Thanks to @Wilfred for nudging me to finally implement this feature. It's been on my list for a long time, but I've been putting it off.

by (or emacs at September 03, 2015 10:00 PM


Dave Winer

Why Twitter must blow past 140

Mathew Ingram has a post where he looks at the pros and cons of Twitter getting rid of the 140-character limit. I don't think there are any cons, they have to do it. Here's why.

  1. Facebook is right. People don't click on links. It's not just true of mobile readers, it's true of all readers, everywhere, all the time. They. Don't. Click. Links. Memorize that.

  2. Facebook wants to be in the news business. So does Apple. Probably a lot of other big tech companies. Will any of them have a 140-character limit? No need to answer that.

  3. So there's Twitter with a package that can handle pictures, movies, vines, etc. But if you want to read a few paragraphs of text you have to click a link? If that limit lasts much longer it's an example of paralyzed management. Obviously there is no technical limit. If you can embed a video, you can include a few paragraphs.

  4. The user experience would have to change a tiny bit. You'd see the first 200 characters or so, then a See More link, exactly like Facebook has. Or if they want to be a little more beautiful, they could put a triangular wedge there and allow it to expand and collapse with a nice animated effect. None of this is even slightly challenging to program.

Conclusion. They have to do it. What's amazing is that they've waited this long without doing it.

September 03, 2015 09:15 PM


Do I need simulink to model the risks of an option portfolio

I wish to buy Matlab Home and learn to model the risks of a derivatives portfolio and then stress test it.

So I am guessing I will need :
Stochastic calculus
Linear algebra
Some ML algorithms?
So is it enough to purchase Matlab Home for 149$ or I need to purchase simulink also?

by Victor123 at September 03, 2015 09:13 PM


Scala XML attribute replacement results in appending modified node as a child

All information here

Ive narrowed it down to the replace method and tried changing the to etc. with none of my desired results showing up as anticipated :/

edit: worst case scenario I may just migrate this project to scales xml

submitted by Brompton_Cocktail
[link] [comment]

September 03, 2015 09:06 PM


Planet Theory

Open Problems That Might Be Easy

A speculation on the length of proofs of open problems

Broad Institute source

Nick Patterson is one of the smartest people I have ever known.

Today I would like to talk about something he once said to me and how it relates to solving open problems.

Nick now works at the Broad Institute on genomic problems—especially large data sets. For decades, he worked as a cryptographer for the British, and then the U.S. He also spent many years with Renaissance Technologies, Jim Simons’s investment hedge fund.

Years ago I consulted regularly at IDA—a think tank for the NSA that was based in Princeton. I cannot tell you what we did, nor what we did not. But I can say we worked on interesting problems in the area of communication. One of the top scientists there was Nick. He was a quite strong chess player, and at tea would often play the other top player at speed chess. I once asked Nick how he would fare against the then world champion Anatoly Karpov. He answered that it would be:

Sack, sack, mate.

I never really believed this. I always thought that he was strong enough to hold out a bit, but perhaps Nick was right. Perhaps it would be over quickly. That Karpov would find a simple, short, demonstration that would wipe Nick out.

But given how smart Nick was I often wondered if he was right when we translate his statement to mathematics:

Can open problems have simple solutions?

Could some open problems fall to: “We know […] and by induction […] we are done”?

Some Short Solutions

Leonhard Euler thought long and hard about possible generalizations of Fermat’s Last Theorem. He was the first to fashion a nearly-correct proof that no cube could be a nontrivial sum of fewer than three cubes. He conjectured that no 4th-power could be a nontrivial sum of fewer than four 4th-powers, no 5th-power a sum of fewer than five 5th-powers, and so on. These cases were open for nearly two centuries, but in 1966, Leon Lander and Thomas Parker used a computer to find

\displaystyle  27^5 + 84^5 + 110^5 + 133^5 = 144^5.

It still took 20 more years for the other shoe to drop on the {n=4} case:

\displaystyle  2682440^4 + 15365639^4 + 18796760^4 = 20615673^4.

This was the smallest member of an infinite family of solutions found by Noam Elkies—himself a master chess player—though not the smallest overall, which is

\displaystyle {95800^4 + 217519^4 + 414560^4 = 422481^4}.

The shortest solution to a problem in complexity theory that was important and open for several years was:

\displaystyle  c = a b a^{-1} b^{-1}.

Well this needs a little context. If either {a} or {b} is the identity, then this group commutator {c} is the identity. Else in a large enough permutation group like {S_5}, you can rig it so that {c} is never the identity. Thus if you a piece of code {P} such that {P(x) = a} if some property {A(x)} holds and {P(x) = 1} otherwise, and {Q} likewise gives {Q(x) = b} iff {B(x)} holds, then the code {PQP^{-1}Q^{-1}} computes the AND gate {C(x) = A(x) \wedge B(x)}.

Since it is easy to code {\neg A(x)} by the code {a^{-1} P}, and since AND and NOT is a complete basis, the upshot is that all {O(\log n)}-depth Boolean formulas can be computed by codes of size {n^{O(1)}}. Permutations of 5 elements allow codes in the form of width-5 branching programs, so what tumbles out is the celebrated theorem of David Barrington that the complexity class {\mathsf{NC^1}} has constant-width branching programs. GLL’s old post on it played up the simplicity, but like many mate-in-n chess problems it was not simple to see in advance.

More recently, a seemingly complicated conjecture by Richard Stanley and Herbert Wilf about permutations was proved in an 8-page paper by Adam Marcus and Gábor Tardos. For any sequence of {k} distinct integers define its pattern to be the unique permutation of {1,\dots,k} that has the same relative ordering. For instance, the pattern of {(110,27,133,144,84)} is {(3,1,4,5,2)}. Given any length-{k} pattern {p} and {n > k} define {R_n} to be the set of permutations of {1,\dots,n} for which no length-{k} subsequence (not necessarily consecutive) has pattern {p}. Whereas {|S_n|} grows as {2^{\Theta(n\log n)}}, Stanley and Wilf conjectured that the maximum growth for {|R_n|} is {2^{O(n)}}, for any fixed {p}. Marcus and Tardos polished off a stronger conjecture by Zoltán Füredi and Péter Hajnal about matrices in basically 1.5 pages whose short lemmas read like sack, sack, mate.

We found the last example highlighted in a StackExchange thread on open problems solved with short proofs. Another we have seen mentioned elsewhere and covered in a post is Roger Apéry’s proof that {\zeta(3)} is irrational, which centered on deriving and then analyzing the new identity

\displaystyle  \zeta(3) = \frac{5}{2}\sum_{n=1}^{\infty} \frac{(-1)^{n-1}}{n^3 {2n \choose n}}.

The Euler examples could be ascribed mostly to “computer-guided good fortune” but the other three clearly involve a new idea attached to a crisp new object: an equation or group or matrix construction. We wonder whether and when the possibility of such objects can be “sniffed out.”

Problems Ripe For Short Solutions?

I sometimes wonder if we have missed a simple proof—no, short is a better term—a short proof that would resolve one of our favorite open problems. Here a couple that Ken and I suggest may fall into this category.

{\bullet} Péter Frankl’s conjecture. Surely it just needs an equation or formula for the right kind of “potential function”? We noted connections to other forms in a followup post.

{\bullet} Freeman Dyson’s conjecture. It stands out even among number-theory statements that are “overwhelmingly probable” in random models, and there ought to be some principle for boxing away potential counterexamples along lines of how certain numbers with fast approximations can be proved to be transcendental. (Then again {\zeta(3)} hasn’t yet been proved transcendental.)

{\bullet} Problems involving deterministic finite automata (DFAs) are often easy to solve or show to be decidable. But here is one from Jeff Shallit that is not: Given a DFA, does it accept a string that is the binary representation of a prime number—is this decidable?

{\bullet} Even simpler is the question, given binary strings {x} and {y} of length {n}, of finding the smallest DFA {M} that accepts {x} but not {y} (or vice-versa). Can we put tight bounds on the size of {M} in terms of {n}?

{\bullet} Can a Boolean circuit {C} of size {s} computing a function {f(x_1,\dots,x_n)} be converted into a circuit {C'} of size {O(s)} with {n} gates {g_i} each computing the XOR of {f(x_1,\dots,x_i,\dots,x_n)} with {f(x_1,\dots,\neg x_i,\dots,x_n)}? This would be a Boolean analogue of the famous Derivative Lemma of Walter Baur and Volker Strassen, which we have discussed here and here.

Harder Ones?

By contrast, the 3n+1 conjecture of Lothar Collatz does not strike us as a candidate. John Conway proved that related forms are undecidable, and we suspect that it will need deep new knowledge of how intuitively multiplicative properties of integers duck-and-weave under the additive structure.

Regarding the Skolem problem we are on the fence: in a sense it involves DFAs which should be easy, but even for small {n} the connections to deep approximation problems in number theory become clear.

Here are four others—what do you think?:

{\bullet} Can polynomials modulo {6} compute the majority function? Of course we mean polynomials of low degree and we allow some flex on the meaning of “compute” provided it relates to polynomial-sized constant-depth circuits.

{\bullet} Can we at least prove that SAT cannot be computed in linear time on a Turing Machine? Note that this does not follow from the proof of {\mathsf{NLIN \neq DLIN}} as we discussed here. For all those of you who are sure that not only does {\mathsf{P \neq NP}}, but also that ETH is true, how can we not prove this “trivial lower bound?”

{\bullet} Can whether an {n}-node graph has a triangle be decided in {O(n^2)} time? Or at least better than matrix-multiplication time? Ken knew a visitor to Buffalo who admitted spending a year on this with little to show for it.

{\bullet} Can we solve the lonely runner problem? Or at least improve the number of runners that it is known to be true for?

Open Problems

How can we possibly forecast how open an open problem is?

by RJLipton+KWRegan at September 03, 2015 08:40 PM


residential mortgage prepayment modelling

I'm trying to develop a model for predicting prepayments, after reading several arcticles about it over the net. the model should use market data and be behavioral model (i.e. regression/survival analysis).

for now i was thinking to follow: "Modelling prepayment risk: Multinomial logit approach..." (NIBC payper)

my goal should be used in ALM. my understaing was that every month i'll run the model on each loan, and the model will calculate A FULL SMM(t) CURVE, using interest rate curve and other variables.

for me able to build SMM(t) curve, which will predict the SMM for each month until the loan ends, I should build a model that predict the probability that a loan will prepay.

so i should run a logit model.

my question is, when running the logit model, how does my data should look like, supposing i have 10 years of data. meaning, should i insert each loan many times (each month)? (Panel data model) does autocorrelation woun't "kill" it? should i do something else?

thank you for your help!

by Guest17 at September 03, 2015 08:29 PM

Ledoit-Wolf Shrinkage estimator not giving positive definite covariance matrix

I used ten year daily data for 407 stocks and computed the daily and monthly covariance matrices. Since I have more variables than observations for the monthly matrix, I wasn't surprised to find the matrix to be not invertible (and hence useless for portfolio optimization). I was surprised to see the daily covariance matrix not invertible. I then tried to shrink the matrix with the Ledoit-Wolf shrinkage estimator using the package tawny. It didn't help. It makes the covariance matrix really, really small, but no invertible.

Does anyone have any suggestions what could be the problem? How could I improve the covariance matrix?

by User1111 at September 03, 2015 08:23 PM


Help using shell-quote-argument correctly

I'm writing some code where I want to prompt the user for a directory and then run a shell command on that directory and capture the output. My understanding is that it is wise to use shell-quote-argument to sanitise the user input before running the shell command (e.g. to properly escape spaces and other special characters).

However, I can't seem to use it correctly. For example, this simple code works

(setq dir "~/") (shell-command-to-string (concat "ls " dir)) 

and returns the contents of my home directory, but this does not:

(setq dir "~/") (shell-command-to-string (concat "ls " (shell-quote-argument dir))) 

since shell-quote-argument turns dir into "\\~/" which the shell doesn't recognise as the home directory.

Could anyone tell me what I am doing wrong?

submitted by benmaughan
[link] [2 comments]

September 03, 2015 08:22 PM


How to dual boot PC-BSD 10.3 (with zfs file system) and debian 7 (crunchbang) using grub2 boot loader in MBR?

I want to dual boot PC-BSD 10.3 with ZFS as the root file system (the only file system) with Debian 7 (crunchbang linux) with ext4 using grub2 boot loader installed in MBR and grub is managed by the debian.

All the documents are dealing with dual booting PC-BSD/FreeBSD using UFS2 file system and Debian(How do I add PC BSD / FreeBSD to Grub 2 boot loader?). So I'm asking the question here.

From my debian's grub configuration

cat /etc/grub.d/40_custom 
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
menuentry "PC-BSD" {
insmod zfs
set root=(hd0,2)
chainloader +1

This entry is being detected by the grub and showing in the boot screen. But when I select PC-BSD, it is showing the error "UFS not found". I think this is because PC-BSD 10.3 is using ZFS instead of UFS2. Please provide me the guide to boot PC-BSD with ZFS and debian using grub2 managed by debian.

by user89034 at September 03, 2015 08:13 PM


Welcome to our New Colleagues at Elemental

Earlier today we announced that we had reached an agreement to acquire Elemental Technologies of Portland, Oregon. Elemental has pioneered a number of software-based solutions for multiscreen content delivery and powers many of the world’s most innovative app-delivered video offerings and new services like 4K TV. Elemental customers use its software to process and deliver video streams that are customized for a wide variety of different formats, displays, devices, data rates, and environments. Here’s a recent photo of the Elemental team:

We have been working with Elemental to address shared customers in the Media & Entertainment (M&E) business for the past four years. During this time we have been impressed by their penchant for moving fast and their long-term vision for software-defined video.  We quickly realized that we could work together to create solutions that spanned the entire video pipeline.

Today’s announcement will allow us to work even more closely together to provide media companies with a family of on-premises, hybrid, and cloud-based solutions for Internet-based video delivery. Of course, I’ll do my best to learn about and then share information about any and all new offerings as they become available. Perhaps I’ll even make a trip to Portland to interview the Elemental folks for an AWS Podcast episode.

On behalf of the entire AWS team I would like to extend a warm welcome to our new colleagues!


by Jeff Barr at September 03, 2015 08:05 PM

Planet Theory


I recently watched the movie Whiplash, about a college jazz band director, Fletcher played by J.K. Simmons, who torments his musicians to force them to be their best. The movie focuses on a drummer, Andrew, which makes for a great audio/video feast but in its essentials Whiplash is a story of a professor and his student.

I can imagine playing the role, “Do you think your proof is correct? Yes or No? Are you applying Toda’s theorem correctly or are you using the same crazy logic your dad used when he left your mom?” OK, maybe not.

Nevertheless Fletcher has a point. Too often I’m seeing graduate student doing just enough to get a paper into a conference instead of pushing themselves, trying to do great work and still not being satisfied. Fletcher says the two most dangerous words in the English language are “good job”.  While that might be a little cruel, we do need to push our students and ourselves to take risks in research and be okay in failing. To roughly quote John Shedd and Grace Murray Hopper, "the safest place for a ship is in the harbor, but that’s not what ships are for."

Whiplash had a different kind of scene that definitely hit home. Andrew could not impress his family with the fact that he was lead drummer in the top college jazz band in the nation. I’ve been there, trying to get my mother excited by the fact that I had a STOC paper early in my career. "That's nice dear".

by Lance Fortnow ( at September 03, 2015 08:01 PM







give potential function - binary heap - extract-min in amortized const time and insert in log amortized time

Consider an ordinary binary min-heap data structure with n elements supporting the instructions INSERT and EXTRACT-MIN in O($\lg n)$ worst-case time. Give a potential function $\Phi$ such that the amortized cost of INSERT is $O(\lg n)$ and the amortized cost of EXTRACT-MIN is $O(1)$.

I try to solve it, but I got in stuck:

Let $c_i$ denotes real cost of $i-th$ operation, and $a_i$ denotes amortized cost.
Let $\Phi(D_i) = \text{number of elements after i operations}$
INSERT:$$a_i=c_i+ \Phi(D_i) - \Phi(D_{i-1}) \le \log(i) + (i)\log(i) - (i-1)\log(i-1) \le 3\log(i+1) = O(\log(i))$$
EXCTRACT-MIN:$$ a_i=c_i + \Phi(D_i) - \Phi(D_{i-1}) \le \log(i) + i\log(i) - (i+1)\log(i+1) =\log(i) + i\log(i) - i\log(i+1) - \log(i+1) < 0 $$

As you can see, my problem is that I get number lower than $0$. Can you help me ?

by M.Swe at September 03, 2015 07:19 PM




Designing a mechanical push-down automaton

I want to create a physical, mechanical push down automaton accepting anεbn. I'd prefer it if the design were made of simple construction toys (LEGO, K'nex, Lincoln Logs, that kind of thing).

I ask because I want something I can demonstrate to high school students that easily demonstrates that style of machine.

submitted by thephotoman
[link] [comment]

September 03, 2015 07:10 PM


zfs dataset constantly unmounting itself after some time, others still available

I have a linux box, that i putty into, which is my backup system. It has a zfs linear span array with 3 2TB drives. Every night i use freefilesync (great program) to sync files to this mount, which is mapped as a network drive.

$ zfs list
NAME                                USED  AVAIL  REFER  MOUNTPOINT
san                                3.31T  2.04T  2.87M  /san
san/vault                          3.31T  2.04T   136K  /san/vault
san/vault/falcon                    171G  2.04T   100K  /san/vault/falcon
san/vault/falcon/snapshots          171G  2.04T   171G  /san/vault/falcon/snapshots
san/vault/falcon/version            160K  2.04T    96K  /san/vault/falcon/version
san/vault/gyrfalcon                 564K  2.04T   132K  /san/vault/gyrfalcon
san/vault/gyrfalcon/snapshots       184K  2.04T   120K  /san/vault/gyrfalcon/snapshots
san/vault/gyrfalcon/version         184K  2.04T   120K  /san/vault/gyrfalcon/version
san/vault/osprey                    170G  2.04T   170G  /san/vault/osprey
san/vault/osprey/snapshots         24.2M  2.04T  24.2M  /san/vault/osprey/snapshots
san/vault/osprey/version            120K  2.04T   120K  /san/vault/osprey/version
san/vault/redtail                  2.98T  2.04T  17.2M  /san/vault/redtail
san/vault/redtail/c                 777M  2.04T  72.9M  /san/vault/redtail/c
san/vault/redtail/c/AMD            4.44M  2.04T  4.24M  /san/vault/redtail/c/AMD
san/vault/redtail/c/Users           699M  2.04T   694M  /san/vault/redtail/c/Users
san/vault/redtail/d                1.59T  2.04T   124K  /san/vault/redtail/d
san/vault/redtail/d/UserFiles      1.59T  2.04T  1.59T  /san/vault/redtail/d/UserFiles
san/vault/redtail/d/archive         283M  2.04T   283M  /san/vault/redtail/d/archive
san/vault/redtail/e                1.34T  2.04T   124K  /san/vault/redtail/e
san/vault/redtail/e/PublicArchive  1.34T  2.04T  1.34T  /san/vault/redtail/e/PublicArchive
san/vault/redtail/e/archive         283M  2.04T   283M  /san/vault/redtail/e/archive
san/vault/redtail/snapshots         184K  2.04T   120K  /san/vault/redtail/snapshots
san/vault/redtail/version          44.3G  2.04T  43.9G  /san /vault/redtail/version

When looking at the linux, via putty, the mounts are there one minute, then a few minutes later they are gone(zfs list always shows them, you have to traverse into them to see if they have been unmounted, they will be empty, or not in the parent dir at all). These are datasets that are losing their mounts.

san/vault/redtail is empty almost every time when i come back the next morning, or a couple hours later, just after freefilesync starts its synch, but before i can start moving files.

I have tried to export and import , same problem happens still. My data is still in tact...

this command will fix all of them momentarily again(for an unknown duration).. zfs mount -a

This all started (coulpe weeks ago) after i made quite a few child datasets inside the parnt san, where before it was only the san dataset parent(i NEVER had any problems), no child datasets existed when all the data was layed in there. I have since moved out the data, made the datases, and moved back int he data.

My machine that keeps network traversing it to synchronize, the mount gets lost in the duration and my backups have yet to finish in over a month now almost. Before it can tally the files (say 15 minutes to an hour) the fricken mount is gone again, and it hangs. Back ups are in limbo.

I may have fubar'd something when i crated the datasets. I had to delete a few after because i wasnt happy with them. The data wasnt either visible or what, but once i had all said and done, things looked perfect after several reboots and double checks!!

After this, things seemed good, but I looked into nto one of its child data sets san/vault/redtail/c and that fricker is empty too, i think it came unmounted.

before i destroy shit. i need to know whats going on. This data is a duplicate backup of a current healthy system, but this backup is the only, therefore is at the mercy of the health of the source drives.. So i cant afford it to be offline at all..

One more note on SNAPSHOT's. I made a snapshot, for the first time, just prior to the seeming brakage. could that have caused it? Can i possibly use this snapshot to fix something? Is this coincidence, or symptomatic?

see this post, Bulk remove a large directory on a ZFS without traversing it recursively if you want information as to why i created a bunch of datasets, with children...

Edit: The question has just been rewritten, due to lack of attention on this post, please read it again..

by Brian Thomas at September 03, 2015 07:04 PM



Javascript reduce on array of objects

Say I want to sum a.x for each element in arr.

arr = [{x:1},{x:2},{x:4}]
arr.reduce(function(a,b){return a.x + b.x})
>> NaN

I have cause to believe that a.x is undefined at some point.

The following works fine

arr = [1,2,4]
arr.reduce(function(a,b){return a + b})
>> 7

What am I doing wrong in the first example?

by YXD at September 03, 2015 06:54 PM





"Hedging" a put option, question on exercise

I have a question on the following exercise from S. Shreve: Stochastic Calculus for Finance, I:

Exercise 4.2. In Example 4.2.1, we computed the time-zero value of the American put with strike price $5$ to be $1.36$. Consider an agent who borrows $1.36$ at time zero and buys the put. Explain how this agent can generate sufficient funds to pay off his loan (which grows by $25 \%$ each period) by trading in the stock and money markets and optimally exercising the put.

The model from Example 4.2.1 he refers to is the following: \begin{align*} S_0 & = 4 \\ S_1(H) & = 8, S_1(T) = 2 \\ S_2(HH) & = 16, S_2(HT) = S_2(TH) = 4, S_2(TT) = 1 \end{align*} with $r = 1/4$ and risk-neutral probabilities $\hat p = \hat q = 1/2$.

So now my question, I am not sure how to solve this, I guess the agent wants in some way hedge that he can always pay the credit by utilising the option. First how should I think about it, that he should pay back his credit at time step $2$, or earlier if possible by exercising the option, or should he stay liquide until the end? And what means optimal, hedging with minimal invest?

Okay, I solved it by considering two scenarios, first using the option at time step $1$, and then at time step $2$. By using it at time step one I found that he has to invest additional $0.16$, buy $1/2$ from the share, and accordingly borrow $0.16 - 1/2 \cdot 4$ from the bank/monkey market. Then at the first time step, as the option was exercised and the value of the portfolio equals $(1 + r)1.36$ he could just invest everything riskless, i.e. readjusting his portfolio by not buying any shares of stock, and investing in the riskless asset $(1+r)1.36$, in this way at time step $2$ he could pay $(1+r)^2 1.36$.

In the second scenario, i.e. exercising the option at time step $2$, I found that he has to invest additional $1.36$ and buy no share at the initial step, and then readjust in the next step his portfolio as to buy $1/12$ of the share if it goes up and $1.06$ if it goes down, and by exersing the option, if it goes up after paying his debt $(1+r)^2 1.36$ his portfolio has the value $1.3$, meaning he still has money, or $-0.3$ if it goes down, meaning he still has some debt (this point I do not understand fully?...)

So can someone help me in understand and solving this exercise (if my approach is wrong...)?

by Stefan at September 03, 2015 06:09 PM


Why we need to read memory on a write-miss?

I noticed that in write-back cache memories, when cpu want to write on a block, it should fetch it from memory then update that block. so if block is going to be overwritten and changed by processor, so why we should read that block from memory?

by VSB at September 03, 2015 06:05 PM



What is the best algorithm to use to determine likelihood of a certain party taking action on something?

I am attempting to write an algorithm that will determine the likelihood that a party will bid on an auction given past behavior, available capital, how much they have/ need, etc. I will be writing a shell that runs the algorithm and returns the result.

What algorithm would be best for solving this? Also, what scripting language would accomplish this in the best way?

I am very new to creating/ using algorithms so any and all feedback is greatly appreciated.

submitted by EmmonsVentures
[link] [4 comments]

September 03, 2015 05:41 PM

Planet Theory

One Lecture Down....

CS125, the "new", "honors-ish" Algorithms and Complexity course, got off to a good start today.  The room was full with not enough seats for people, the students asked good questions and responded well to questions asked, and we got through the amount of material I expected.  It's year 2 of the class, which is easier than year 1 in some ways (lots of material prepared), and possibly harder in others (some thinking about what needs to be tweaked or fixed from year 1).  We'll see how things shake out next week, but I'm expecting we'll be in the 30-40 student range, like last year.  I can never tell if I managed to scare students off or make them want to take it.  (The challenge is that I want to do both;  scare off people without sufficient background, but interest students who do but might not realize it and might not even be Computer Science majors.)  Pleasantly, I felt very excited during and after the lecture, and will try to hold on to that positive energy.  

In other, much much stranger news, Harvard's CS50 appears to have a "backlash" movement, as described in today's Crimson article.  Apparently, according to some, there's intense "social pressure" to take CS50, and students need to be told that it's OK not to take the class.  I find this quite odd and, from my vantage point, misguided.  (Of course, I'm not a college freshman.)  I can't recall any such organized movement against Economics 10 at Harvard, which has been for decades now the most popular class at Harvard, although even when I was a student there was something one could potentially call cult-like about it.  (Cult of Wall Street....)  But that didn't mean people complained;  if you didn't want to take the class, you didn't, not really a thing.  Sure, the CS department here has been actively trying to attract students for decades -- CS50 was a good-sized class even before David Malan took over -- and David has just been very successful at it, with a combination of organization, interesting material, vision, and, yes, good marketing.  Naturally, here in CS, we believe that in our idealized world nearly all undergraduates would have a CS course as part of their liberal arts education, and we provide several other entry courses besides CS50.  I was initially thinking the movement described in the article was just a joke, and maybe I'm being April Fooled, but I'm not sure where those responsible are coming from.

And speaking of bringing in students to CS, Ben Golub and Yaron Singer are doing a new Econ-CS class at Harvard (counts for either;  also good for Applied Math majors) simply entitled Networks.  I'm a bit jealous -- this is a class I've thought about teaching also, but was busy and happy teaching algorithms -- but hopefully now that they've started it up it means I'll get a chance to teach it some year(s) down the line.

More insight into whether our enrollment numbers are still rising (is that still possible?) next week...


by Michael Mitzenmacher ( at September 03, 2015 05:38 PM


Lambda the Ultimate Forum

Implementing "Elements of Programming" in Actor Script

In an effort to understand how ActorScript is to program with, and because I think Stepanov's "Elements of Programming" represents the best 'core' for any generic programming library, I want to try and translate the Elements into ActorScript. As there is no environment for actually testing code I thought I would post it here for comments from those who can tell if its correct or not. The idea is each top-level post should be a block of Elements code in ActorScript for comments. I will revise the top-level code according to the corrections and suggestions in the comments. I don't plan on using Unicode, but I will try and get the ASCII representations correct too.

September 03, 2015 04:59 PM


Does the low CAD positively or negatively impact Canadian Investors? [on hold]

I'm interested in getting into investing, but I have a limited amount of business experience. I plan on putting a small amount of money on the market just to see how it goes.

I don't quite understand how the low CAD affects my investments in American stock and international currency though. Suppose for a long term investment, will the eventual rise of the CAD benefit my investments or not?

by Nat at September 03, 2015 04:52 PM



Practical application of Finite State Machines

I am looking for practical applications of Finite State Machines like DFA, NFA, Moore, Mealy machines...

It would be helpful if someone point to examples from Linux Kernel. I know that DFA is used in string matching like KMP algorithm.

What is the significance of NFA, Moore and Mealy machines?

by user5507 at September 03, 2015 04:15 PM

Dave Winer

Baseball humor

I love going to baseball games because, in certain cases, it's okay to be incredibly rude to people, and it's funny.

For example, we were waiting on the train home from CitiField last night, reveling in a solid Mets win that included an inside-the-park home run, when a group of Phillies fans comes on.

I said something like "Sorry we humiliated you guys again." There is some honor in being a Phillies fan after all.

Then I noticed one of them was wearing a Yankees shirt. "What are you doing here?" A few seconds pass: "I guess they'll let anyone in these days."

PS: This started as a Facebook thread.

September 03, 2015 04:07 PM


High Scalability

How Agari Uses Airbnb's Airflow as a Smarter Cron

This is a guest repost by Siddharth Anand, Data Architect at Agari, on Airbnb's open source project Airflow, a workflow scheduler for data pipelines. Some think Airflow has a superior approach.

Workflow schedulers are systems that are responsbile for the periodic execution of workflows in a reliable and scalable manner. Workflow schedulers are pervasive - for instance, any company that has a data warehouse, a specialized database typically used for reporting, uses a workflow scheduler to coordinate nightly data loads into the data warehouse. Of more interest to companies like Agari is the use of workflow schedulers to reliably execute complex and business-critical "big" data science workloads! Agari, an email security company that tackles the problem of phishing, is increasingly leveraging data science, machine learning, and big data practices typically seen in data-driven companies like LinkedIn, Google, and Facebook in order to meet the demands of burgeoning data and dynamicism around modeling.

In a previous post, I described how we leverage AWS to build a scalable data pipeline at Agari. In this post, I discuss our need for a workflow scheduler in order to improve the reliablity of our data pipelines, providing the previous post's pipeline as a working example.

Scheduling Workflows @ Agari - A Smarter Cron

by Todd Hoff at September 03, 2015 03:56 PM


EclipseFP doesn't working

I am trying to install EclipseFP on Eclipse version: Mars Release (4.5.0) on Mac OS Yosemite 10.10.5.

While Haskell perspective is appeared, nothing special for Haskell language is not working (Syntax highlight, hoogle search & etc...)

Syntax highlight Hoogle Search

Here is my Eclipse Preferences: Eclipse preferences

And it seems that problem is that not all necessary packages could be compiled (But sure that this is the root cause)

For example, when I am pressing button "Install from Hackage" in the Haskell helper executables it is trying to compile it but fails because of ghc-pkg-lib-0.3:

Resolving dependencies...
Notice: installing into a sandbox located at
Configuring ghc-pkg-lib-0.3...
Building ghc-pkg-lib-0.3...
Failed to install ghc-pkg-lib-0.3
Build log ( /Applications/ ):
Configuring ghc-pkg-lib-0.3...
Building ghc-pkg-lib-0.3...
Preprocessing library ghc-pkg-lib-0.3...
[1 of 1] Compiling Language.Haskell.Packages ( src/Language/Haskell/Packages.hs, dist/dist-sandbox-b2e886dd/build/Language/Haskell/Packages.o )

    Couldn't match type ‘[Char]’
                   with ‘Distribution.ModuleName.ModuleName’
    Expected type: InstalledPackageInfo_
      Actual type: InstalledPackageInfoString
    In the expression: pkgconf
    In the expression:
      pkgconf {exposedModules = convert e, hiddenModules = convert h}

    Couldn't match type ‘ExposedModule’ with ‘[Char]’
    Expected type: [String]
      Actual type: [ExposedModule]
    In the first argument of ‘convert’, namely ‘e’
    In the ‘exposedModules’ field of a record

    Couldn't match type ‘ExposedModule’
                   with ‘Distribution.ModuleName.ModuleName’
    Expected type: [Distribution.ModuleName.ModuleName]
      Actual type: [ExposedModule]
    In the ‘hiddenModules’ field of a record
    In the expression:
      pkgconf {exposedModules = convert e, hiddenModules = convert h}
cabal.real: Error: some packages failed to install:
buildwrapper-0.9.1 depends on ghc-pkg-lib-0.3 which failed to install.
ghc-pkg-lib-0.3 failed during the building phase. The exception was:
ExitFailure 1

Could somebody please help me?

by Yury Kochubeev at September 03, 2015 03:54 PM





Show progress of Haskell program

I have list in Haskell with some objects. And I need to find out whether someone of these objects satisfied certain condition. So, I wrote the following:

any (\x -> check x) xs

But the problem is that check operation is very expensive, and the list is quite big. I want to see current progress in runtime, for example 50% (1000/2000 checked).
How I can do this?

by pfedotovsky at September 03, 2015 03:08 PM


Numerical computation of Heston model Integral: Simpsone Rule or Gauss-Legendre Method

I want to price a call option using the Heston model for a given set of parameters.

theory from URL:

The integral equation (18) needs to be approximated.

I'd like to either use the Simpson Rule or Gauss-Legendre method for it.

Does anybody have the matlab or scilab codes for it ?

by Matthias at September 03, 2015 03:04 PM


Comp Sci +(Computational Linguistics/Neuroscience?)

Hi guys,

I'm back again with a second career advice post. As you may know from my first post, I'm starting college in a few days and will be majoring in computer science. However, I am also interested in two other subjects, which I think will intertwine with computer science; computational linguistics and computational neuroscience. What do you guys think is the best option for me right now, as a freshman, and what career path should I take in the (near) future?

Thanks in Advance, Shaheer

submitted by shaheer0702
[link] [comment]

September 03, 2015 02:30 PM


Are low oil prices and low shipping costs really a leading indicator for a shrinking economy

Recent article in Bloomberg saying that lowered shipping costs n the form of the Baltic Dry Index and lowered oil prices are in someway a concern for a growing global economy:

This is not the only recent financial article I have read showing concerns about the price of oil and implications that low oil prices are in anyway a bad thing for the rest of us.

Now, I'm not going to disagree with part of the article's idea that lowered commodity prices are at the very least, not a good sign for growth. But the rest of the concerns ultimately having to do with lower shipping costs would in my mind have the opposite effect on the global economy than the article implies. Decreased shipping costs should increase trade, and therefore wealth.

Furthermore it seems to me that it is supply, not demand, that is causing oil prices to fall. Increased engine efficiency also should be decreasing demand. None of this has anything to do with global economic growth

What valid concerns does this article (et al) have in terms of the implication that lowered fuel costs are a leading indicator of decreased future global economic growth?

Sincerely, Clickbait Victim

Links: Baltic Dry Index:

by boulder_ruby at September 03, 2015 02:28 PM


Modal editors, any tips for navigation and inserting I'm quick succession?

I tried switching to god-mode and evil-mode a couple times each and I've been frustrated by my inability to effectively switch between navigating and editing rapidly.

Maybe it's just the way I approach editing text but I find that I fall into the following pattern often: I'll type a word or two, see a typo at the beginning of the first word, navigate to it, fix it with some nav and edits, and then nav forward again. Doing this with modifiers just feels more natural to me. With modal editing i find the need to switch modes twice to move the cursor back or forward one char then edit to be really slow and annoying.

Am I missing some shortcuts or patterns (probably!)? Does it just get better with time? I'd love to hear some anecdotes too. I've been getting pretty bad RSI in my right hand especially, haven't been able to curb it just by switching keyboards.

Edit: typo in title. On phone can't seem to edit it. My bad.

submitted by ganglygorilla
[link] [12 comments]

September 03, 2015 02:24 PM


Why the disparity of pi?

A colleague of mine recently pointed out to me that commonly in programming, pi is stored as a double with the following value:


However, when storing pi as a float, it is stored with the following value:


It's understandable that the approximation would be slightly off, a simple rounding may not do the trick, but why is it rounded so oddly? And furthermore, these both seem to be used interchangeably within codebases and projects, with no apparent reasoning as to why.

Any answers would be appreciated. We're interested in the why of the value as well as why you'd interchange them so much. Anyone have any experience development-side with this?

Edit: Thanks for all the replies, they were incredibly useful and helped us understand the various factors that come into play when dealing with this.

submitted by ZedsTed
[link] [34 comments]

September 03, 2015 02:23 PM


Research topics - neural networks and market liquidity

I am a masters student looking for some direction on using neural network on market depth data to help predict market liquidity and bid-ask spreads. Can some of the more experienced people give me some research guidance and guide me to some papers that I should be reading? I have tried Google scholar but can't find something meaningful to kick off my research. Thanks

by user2007598 at September 03, 2015 02:22 PM



How do I show that there is no tangency portfolio?

Question: Suppose that the risk-free return is equal to the expected return of the global minimum variance portfolio. Show that there is no tangency portfolio.

A hint for the question states: Show there is no $\delta$ and $\lambda$ satisfying

$$\delta\Sigma^{-1}(\mu-R_f\iota)= \lambda\pi_\mu + (1-\lambda)\pi\iota$$

but I'm not sure what to make of it. Any help is appreciated.

by user2034 at September 03, 2015 01:55 PM


Compiling FreeBSD on OSX

I currently have the source for the FreeBSD kernel. I want to compile it and create an iso image, on my OSX 10.9 computer. Can this be done? Do I need any other tools?

by Greencat at September 03, 2015 01:46 PM


Finite State Machine [on hold]

The number of states required by a Finite State Machine to simulate the behaviour of a computer with a memory capable of storing m words,with each word being n bits long is

by swathy at September 03, 2015 01:31 PM

Dave Winer

Update on Slack-like code

A brief note: The new version of nodeStorage, v0.78, is out.

This is the first version with Slack-like webhooks.

I have been able to use my existing Slack webhook code with this, both of which run unmodified with this server. So to that extent this is a clone, but I'm not claiming that at this time. They work like Slack, a subset of Slack's capabilities, and are not complete.

You can see the results of my testing in this liveblog post.

September 03, 2015 01:19 PM


calculate annualised tracking errors

I have 36 months of relative returns. I need to calculate the annualised tracking error. So using 36 months of returns is it simply like below,

st dev(36 months of returns) * sqrt(12)

Why the sqrt(12)?

by mHelpMe at September 03, 2015 01:10 PM

Planet Emacsen

Irreal: File Local Variables in an Org File

A nice tip from Artur Malabarba: It's often convenient to use file local variables in your Emacs files. It provides a way of setting certain Emacs variables<img src=—" class="wp-smiley" style="height: 1em; max-height: 1em;" />mode, fill-column, comment-column, etc.<img src=—" class="wp-smiley" style="height: 1em; max-height: 1em;" />on a per file basis.

There's a problem, though, when using them in an Org file. Org considers anything after a heading to be part of that heading's subtree. Thus, if you move the node that the file local variables are in, they get dragged along with the node. The answer, Malabarba says, is to put the file local variables in their own COMMENT subtree. Those subtrees don't get exported so it's a perfect solution when using Org to write text intended for export to HTML, LaTeX, Word, and so forth.

Malabarba also gives us a bit of Elisp to make navigating such a buffer easier. If you often use file local variables in Org files, you will probably want to add his code to your Emacs configuration.

by jcs at September 03, 2015 01:06 PM


Interpretation of the CAPM model under Stochastic Portfolio Theory framework

The CAPM under the Modern Portfolio Theory approach is given as:

$$ R_i = \beta_i R_\pi $$ Where $R_\pi$ the portfolios expected excess returns

Under the stochastic portfolio theory approach: $$ r_i = \beta_i r_\pi + \frac{\sigma_\pi^2}{2}(\beta_i -1) $$ Where $r_i$ is the continous excess return ($r_i = \log( 1+R_i)$), $\sigma^2_\pi$ is the variance of returns.

I am not sure how to interpret this from a finance perspective (the mathematical derivation is straight forward). The SPT CAPM, in contrast to MPT CAPM is said to price a stocks non systematic risk to the extent it differs from its systematic risk, however I thought the MPT CAPM does this too? Any economic intuition would be greatly appreciated.

by dimebucker91 at September 03, 2015 12:52 PM


Generate Mesh of Shadow Volume

Given a mesh, I want to generate another mesh that is effectively a shadow volume of this mesh.

I am looking for an algorithm/solution/implementation/reference that can do this under the following constraints:

  1. I have a single, point light source. Thus:
    • Not a polygon;
    • I want hard edges/faces (not soft shadows)
  2. The light source is at infinity, in fact can be assumed to by directly above the mesh. Thus:
    • The projection is actually orthographic and depends only on a single direction.
  3. I want an exact solution.
    • The result mesh is not used for display so cannot be based on resolution dependent rasterization as many algorithms are.

A naive algorithm is to extrude all mesh triangles in the direction away from the light, e.g. downward, and do a CSG union of all these extruded "prisms".
The drawback here is that:

  1. It does this for all triangles and not just the visible ones (how to do exact visibility?);
  2. it requires a massive amount of CSG operations which are expensive.

I'm looking for a more efficient algorithm.

by Adi Shavit at September 03, 2015 12:51 PM



DragonFly BSD Digest

BSDNow 105: Virginia BSD Assembly

BSDNow 105 is up, and has all the recent news, plus an interview with Scott Courtney about the in-about-a-month vBSDCon 2015.

by Justin Sherrill at September 03, 2015 12:45 PM




How to denormalize a nested Map in Scala, with the denormalized string containing only values?

I am new to Scala and FP in general and wanted to know how a nested collection can be denormalized in Scala. For example, if we had data of a question paper for an exam as:

Map("subject" -> "science", "questionCount" -> 10, "questions" ->
  Map("1" ->
    Map("ques" -> "What is sun?", "answers" ->
      Map("1" -> "Star", "2" -> "Giant bulb", "3" -> "planet")
    ), "2" ->
    Map("ques" -> "What is moon?", "answers" ->
      Map("1" -> "Giant rock", "2" -> "White lamp", "3" -> "Planet")

When denormalized into strings, using only the values, it can be written as:

science,2,1,What is sun?,Star
science,2,1,What is sun?,Giant bulb
science,2,1,What is sun?,planet
science,2,2,What is moon?,Giant rock
science,2,2,What is moon?,White lamp
science,2,2,What is moon?,Plane

I understand that I can use map to process each item in a collection, it returns exactly the same number of items. Also, while flatMap can be used to process each item into multiple items and return more items than in the collection, I am unable to understand how I can do the denormalization using it.

Also, is it possible to do a partial denormalization like this?

science,2,1,What is sun?,[Star,Giant bulb,planet]
science,2,2,What is moon?,[Giant rock,White lamp,Planet]

by aa8y at September 03, 2015 12:22 PM


Set partition and exact cover

On my research i found algorithm which calculates all solutions of exact cover is Knuth X algorithm


Universe = { 1, 2 ,3 ,4 ,5 ,6 ,7}


{1,2,3,4}, Cost 10

{5,6,7}, Cost 20

{5}, Cost 5

{6,7}, Cost 10 ]

On this example we have to possible solutions {1,2,3,4} and {5,6,7} with cost 30 and the second solution is {1,2,3,4}, {5} and {6,7} with cost 25. My aim is to get the minimum cost with the associated subset.

Any suggestion on how to attack this problem.

by NARAYAN CHANGDER at September 03, 2015 11:58 AM


How to pull and/or calculate implied volatility of an ATM Call as its quoted everyday in Python? [on hold]

Set of tasks:

  1. Pull quote for option with terms T = 1, S = K
  2. Calculate Implied Volatility of this option
  3. Plot datapoints

Thank you folks!

by Justin Alick at September 03, 2015 11:56 AM


Wenn ihr euch fragt, wo dieser Gestank heute herkommt: ...

Wenn ihr euch fragt, wo dieser Gestank heute herkommt: Die FAZ hat den Orban seine Ausländerhetze in ihrem Politikteil auf die Straße auskübeln lassen. Ist vielleicht besser so, dass der Schirrmacher das nicht mehr erleben musste. Ekelhaft.

September 03, 2015 11:01 AM


Can I install FreeBSD, OpenBSD, NetBSD and DragonFly on one disc?

Is it possible to install all of FreeBSD, OpenBSD, NetBSD and DragonFly BSD on a single disc?

What would be the proper procedure for success, and what are the caveats?

by cnst at September 03, 2015 10:36 AM

Fred Wilson

Forgive and Forget

My partner Albert has long been arguing on his blog (which you should be reading) that technology is bringing massive changes to society and we are going to have to change the way we think about things in reaction to these changes. One of those areas is privacy and the “post privacy world” that we are entering.

Last week he wrote this about the Ashley Madison hack and argued that society needs to be accepting of and forgiving of transgressions like using a website to arrange extramarital affairs. When I read that post, I thought “oh my, that’s pretty out there.” Mind you, I don’t disagree with Albert’s point at all, I just thought most people aren’t going to see it that way.

So I was pleasantly surprised this morning to see Farhad Manjoo make essentially the same argument in the New York Times.

Farhad quotes a security expert at the end of his post:

True online security is not just defending against compromise, it’s operating under the assumption that compromise will happen.

And of course that is true for the technologists who work in the teams that strive to keep our systems secure. But if you, like Albert and Farhad did, take that point to its logical conclusion, then all of us will have are in for having our deepest darkest secrets outed at some point. So let’s hope society becomes more forgiving over time. It’s going to have to.

by Fred Wilson at September 03, 2015 10:29 AM


exact cover set problem

i am searching an heuristic algorithm for a weighted exact cover problem shown here:

On my research i only found algorithm which calculates all solutions without any cost function, like

Do you know some algorithm for that?

My problem has a Universe of size 200 and about 500 000 subsets, so it is not possible to calculate all solutions.



Universe = { 1, 2 ,3 ,4 ,5 ,6 ,7}


{1,2,3,4}, Cost 10

{5,6,7}, Cost 20

{5}, Cost 5

{6,7}, Cost 10 ]

On this example i have to possible solutions {1,2,3,4} and {5,6,7} with cost 30 and the second solution is {1,2,3,4}, {5} and {6,7} with cost 25.

by Hunk at September 03, 2015 10:28 AM


Bloomberg implied volatility smile for equities

I was wondering if someone knows how Bloomberg does their computations for the implied volatility smile for equities.

As far as I understand, they use a lognormal mixture to model the stock prices. But I could not find any more documentation about this topic. Thanks in advance.

by Jcl at September 03, 2015 10:07 AM

Lamer News




What does atomicity mean in distributed systems?

In a distributed system, how are atomic operations are done when network connections inherently have the Two Generals problem?

Does atomicity have a different meaning in this context?

by Mohsen at September 03, 2015 08:56 AM


Calculating VaR with Monte Carlo simulation

I would like some help here :) I have a problem calculating VaR with the Monte Carlo Simulation. I have followed then next steps, is this a right way to calculate VaR or I need something more?

1.Generate random Numbers

2.Define Correlation Matrix

3.Define Volatlities, Drift and weights

4.Cholesky Decompose the Correlation Matrix

5.Multiply Random Numbers by Cholesky Matrix

6.Multiple Result of Step 5 by Volatility and Drift

7.Take the exponent of Results from Step 6

8.Take Log returns of Step 7 Results

9.Create the weighted portfolio returns

10.Calculate the VaR (use Percentile function at right confidence interval)

11.Calculate the Volatilites of your random numbers

12.Cross Check with Analytical VaR

Thank you, in advanced!

by maria at September 03, 2015 08:53 AM


Proving relationship between set complement, intersection and union [migrated]

Show that $S_1 \cup S_2 = \overline{\overline{S_1}\cap \overline{S_2}}$.

So we want to show that the union of $S_1$ and $S_2$ is equal to the compliment of the intersection of the compliments of $S_1$ and $S_2$. I hope that makes sense!

I created some finite sets with some random values. Here's what I have so far but I'm stuck!

$$\begin{align*} S_1 &= \{1, 2, 3, 4\}\\ S_2 &= \{2, 3, 4, 5, 6\}\\ S_1 \cup S_2 &= \{1, 2, 3, 4, 5, 6\}\\ \overline{S_1} &= \{5, 6\} &\text{(everything in $S_2$ but not in $S_1$)}\\ \overline{S_2} &= \{1\} &\text{(everything in $S_1$ but not in $S_2$)} \end{align*}$$ I don't see a way to intersect $\overline{S_1}$ and $\overline{S_2}$, can someone point me in the right direction please? Am I on the right track, or way off?

by Zikir Tylon at September 03, 2015 08:43 AM


Emacs and gvim display no content on FreeBSD 10.2

I just installed FreeBSD 10.2 with the Mate desktop environment and most things seem to work. I have a problem with emacs and gvim though. Since the problem has the same symptoms with both editors, I suppose the cause is somewhere else.

In both editors, the content of the edit buffer is initially black, even if it contains text. If I click with the mouse on the window top bar and move it slightly, the content is repainted and looks good: fonts, colours, syntax highlighting are as expected. But as soon as I type something in the editor window, the content becomes black again.

In gvim, I also tried the following:

  1. I open a file -> content is black
  2. I move the editor window -> content is displayed
  3. I jump to the end of the buffer with G -> content is black again

In emacs I opened slime and it had similar problems:

  1. The buffer is initially black.
  2. I move the editor window -> buffer is repainted, fonts and colours are good
  3. I type something in the slime buffer -> buffer is black again

I tried to produce screenshots using import but, interestingly, the resulting images show the correct content of the editor windows, whereas the window on screen still has black buffers. It seems that the window contents are rendered correctly but the result is not displayed on the screen.

I have also tried out gedit and it works properly.

Both emacs and vim work properly in a text console.

The only explanation I can think of is that it might be an Xorg problem (missing or wrong repaint event or something like that) but I have no clue as to where I should look now. Any ideas?


I just found out that the problem disappears if in

System -> Preferences -> Windows -> General

I check the option "Enable software compositing window manager"

Note that I have this problem on FreeBSD only. On Debian Wheezy with Mate and "software compositing window manager" disabled, emacs and gvim work normally.

by Giorgio at September 03, 2015 08:33 AM


Does flatMap functional signature (input -> output) proposes its doing any flattening?

flatMap signature:

 /* applies a transformation of the monad "content" by composing
  * this monad with an operation resulting in another monad instance 
  * of the same type
  def flatMap(f: A => M[B]): M[B]

is there anyway to understand by its signature (input to output) (except for the name flat) that its flattening the structure? or must I read its implementation to understand that? isn't good coding practice means I can understand by the signature of the function (input to output even without function name) exactly what it does? if so how does flatMap follows this basic programming practice? or does it violates it?

by Jas at September 03, 2015 08:31 AM




Do futures have predictive value?

Futures closely mirror their underlying, as can be seen in the charts below. Eventually, at expiration, they reach the value of the underlying. However, they seem to show no extra information about the underlying's future; i.e. they don't trend any closer to the underlying's value at expiration than the underlying itself does.

Below is an underlying, SPY.

SPY chart

And below is the ES futures Dec 2012 contract.

ES chart

However, it is possible to calculate how often a given future has been directionally correct. For instance, SPY closed at 142.79 on Dec 21, 2012. To determine the directional predictive power of the ES future, one would run the following pseudocode in a backtesting system:

$SPY_close = 142.79
    $future_predicted_direction = 0
$days = 0
    FOR $day (2012-01-01 .. 2012-12-21) {
      IF $EOD_quotes["SPY"][$day] > $SPY_close
      THEN $future_predicted_direction += $EOD_quotes["ES"][$day] <= $EOD_quotes["SPY"][$day]
          ELSE $future_predicted_direction += $EOD_quotes["ES"][$day] >= $EOD_quotes["SPY"][$day]

print "ES futures correctly predicted direction $future_predicted_direction out of $days days, that is, "
+ int($future_predicted_direction / $days * 100) + "% of the time."

Are there any studies on this? Do futures have any predictive power?

by Gascoyne at September 03, 2015 08:08 AM

Planet Emacsen

Jorgen Schäfer: Elpy 1.9.0 released

I just released version 1.9.0 of Elpy, the Emacs Python Development Environment. This is a feature release.

Elpy is an Emacs package to bring powerful Python editing to Emacs. It combines a number of other packages, both written in Emacs Lisp as well as Python.

Quick Installation

Evaluate this:

(require 'package)
(add-to-list 'package-archives
'("elpy" .

Then run M-x package-install RET elpy RET.

Finally, run the following (and add them to your .emacs):


Changes in 1.9.0

  • Elpy now supports the autopep8 library for automatically formatting Python code. All refactoring-related code is now grouped under C-c C-r. Use C-c C-r i to fix up imports using importmagic, C-c C-r p to fix up Python code with autopep8, and C-c C-r r to bring up the old Rope refactoring menu.
  • C-c C-b will now select a region containing surrounding lines of the current indentation or more.
  • C-c C-z in a Python shell will now switch back to the last Python buffer, allowing to use the key to cycle back and forth between the Python buffer and shell.
  • The pattern used for C-c C-s is now customizeable in elpy-rgrep-file-pattern.
  • <C-return> now can be used to send the current statement to the Python shell. Be careful, this can break with nested statements.
  • The Elpy minor mode now also works in modes derived from python-mode, not just in the mode itself.

Thanks to ChillarAnand, raylu and Chedi for their contributions!

by Jorgen Schäfer ( at September 03, 2015 07:46 AM


Where can I copy-out the windows 7 PATH variables on an external drive? [on hold]

I had to change my hard drive because I switched to SSD soon to find out that my old hard drive actually had bad sectors which is the reason I cannot mirror my whole drive. Now the most annoying part are the PATH variables I entered and I am not sure I will get them all back together. I found out that they are saved in the following registry path

System variables:

HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Control\ Session Manager\ Environment

User variables:

HKEY_CURRENT_USER\ Volatile Environment

The registers are saved at %winpath%/System32/config, but the question is not how to find but how to open or read them and copy out only the PATH variables. I am thankful for any help!

by chriszo111 at September 03, 2015 07:19 AM



Proof for multiplicative dominance of universal probability distribution

I'm looking for the proof of the Leonid Levin theorem that states that the universal prior distribution function multiplicatively dominates all other functions of its type. The original article is Levin 1974 but it is in Russian Thanks

by Uri at September 03, 2015 07:13 AM


How Can we make $O(N)$ order statistic queries with Fibonacci heap in $O(N)$?

We have a Fibonacci heap with $N$ unique elements we want to make $O(N)$ order statistic queries (e.g., what is the element number 7 in this collection if it was sorted). Moreover, we know that the order statistic queries would be made in order (e.g., for $O(N)$ queries: what is elements number $1,3,5,\ldots,N$) .

How can we do it in $O(N)$ time ? We can't change the data structure in any way.

My thoughts: 1)The naive way is just to do delete min $N$ times but it would take $O(N \log N)$ so its not good enough.

2) if we could somehow use increase-key (if its min heap or decrease-key if its max heap) we would be able do find min in constant time increase its key so its larger than anything else in the heap we could avoid delete-min and its amortized bound of $O(log(n))$ entirely, but we are not allowed to modify the data-structure .

EDIT: sorry i didn't make it clear the heap don't have trees that waiting to be melded , otherwise its not possible since the heap could have N degree 0 trees and we cant sort N items in $O(N)$ time.

by Boris Morozov at September 03, 2015 07:12 AM



Any Natural Problems shown Easy by Reduction to Horn SAT?

To show that a problem is polynomial-time solvable, an often-successful technique is to reduce it to 2SAT (that is the problem of deciding satisfiability of CNF formulas with every clause containing at most 2 literals).

On the other hand, I never seem to have success with reducing problems to Horn SAT (that is the problem for CNFs in which every clause contains at most 1 positive literal). This might not be too surprising given that the algorithm solving Horn SAT is (in some sense) more basic than the one for 2SAT, and because satisfiable Horn formulas have minimal models while most other problems don't.

Nevertheless, are there some examples of problems that are (most naturally) shown to be easy by reduction to Horn SAT?

by Dominik Peters at September 03, 2015 06:51 AM


Intro CS Theory Question - Finite Sets [on hold]

I am starting an intro to CS theory course and have to answer the following question:

Show that if S1 and S2 are finite sets with |S1| = n and |S2| = m, then |S1 U S2| <= n + m.

I can show this by creating some random finite sets, taking the union and comparing the size but I am a bit confused as to how to express this answer in proper cs theory form, whatever that may be. The question doesn't state whether to use a certain proof (proof by contradicition and induction are the only thing we've been introduced to so far), so I'm not sure if what I've done is sufficient?

S1 = {1, 2, 3}
S2 = {4, 5, 6}
|S1| = 3
|S2| = 3
S1 U S2 = {1, 2, 3, 4, 5, 6}
|S1 U S2| = 6
3 + 3 <= 6

by Talen Jokha at September 03, 2015 06:31 AM


Garbage collection thoughts

This has already been submitted and is originally from 2012, but I think it is interesting to re-discuss after Rust has finally been releases. How much does it still match the language and problems described in this post?


by skade at September 03, 2015 06:09 AM

Planet Theory

FOCS 2015 and KARPFest80

[Forwarding an announcement by Prasad Raghavendra –Boaz]
FOCS 2015 will be held at Berkeley, California on October 18–20, 2015. Registrations are open at:
The deadline for early registration is Sept 25th.

On Saturday October 17, the day immediately before FOCS 2015, the Simons Institute for the Theory of Computing will host a celebration of the work of Dick Karp on the occasion of his 80th birthday. This event, which features a dozen speakers from across the spectrum of theoretical Computer Science, will also be held at the DoubleTree Marina Hotel, the same venue as FOCS itself. All participants are asked to register to attend; registration is free, and is done in conjunction with FOCS registration. Visit KARPfest80 for more information.

by Boaz Barak at September 03, 2015 05:57 AM


Planet Theory

Automated drawing and optimization of syntax diagrams

If you've ever written a computer program you probably learned a little bit about context-free grammars, a formal method of describing the syntax of most programming languages that can be turned automatically into code for parsing those languages through the magic of compiler-compilers such as yacc. They're used for most other kinds of formal language, too: see for instance the JSON home page, which includes a grammar describing the syntax of data-description language JSON.

Go look again at the JSON page more carefully, and you'll notice that there are actually two different descriptions of the same context-free language. In a sidebar on the right of the page is a context-free grammar, a textual description of the JSON language. But more prominently, to the left of the grammar, are a bunch of diagrams that look like railroad tracks, called syntax diagrams. You can generate any string in the language by starting at the left terminal of the object track, following any smooth path in the track until you get to the right terminal, copying down the train stations you passed through on your way, and then expanding out the rectangular ones in the same way recursively. This idea of following smooth paths in a railway-like diagram has been floating around in graph drawing under the name of "confluent drawings" for a few years, but it's been used in syntax diagrams for much longer. It's really a generative model rather than a discriminative model, in that it tells you how to form all possible strings in the formal language but it doesn't tell you how to parse a string; nevertheless, it carries the same basic information as the grammar.

Or does it? Look a third time at the JSON page, and you might notice that the syntax diagram does not reflect the exact structure of the grammar. For instance, the grammar says that there should be only two smooth paths through the "object" track: one that passes through the two train stations "{" and "}", and a second one that passes through the "members" station between these two. But the diagram has a looped section of track that allows you to form infinitely many smoothed paths! They both describe the same language, but not in the same way. And this is a clue that the diagram must have been drawn by hand, or possibly drawn from a different grammar, because all the software I can find on the net for drawing syntax diagrams only performs a direct translation of a grammar to a diagram.

So this is where my new preprint, Confluent Orthogonal Drawings of Syntax Diagrams (arXiv:1509.00818, with Michael Bannister and David Brown, to appear at Graph Drawing) comes in. We take the point of view that writing grammars to be parsed automatically (essentially, a discriminative view of formal languages) and drawing syntax diagrams to visualize the strings that belong to the formal language (a generative view) are two different things, and that one can perform optimizations while going from one to the other (as the JSON people did) to make the syntax diagram easier to understand. Our paper envisions a third, intermediate description of a formal language (like the train station between the two brackets) that is abstract enough to allow these sorts of optimizations while also being flexible enough to describe more train tracks than the series-parallel ones that one gets from direct translations of grammars.

In this third description, a formal language is described as a set of nondeterministic finite automata, one for each of the tracks of a diagram or for each of the nonterminals of a grammar. However, unlike the usual form of NFAs that can only recognize regular languages, the ones we use have alphabets that include both terminals and nonterminals. You can use these to generate strings in the formal language in the same way that you would with syntax diagrams: generate a string in the language by following a path from start to accept state of the initial NFA but then recursively do the same thing to expand each nonterminal in the string.

An NFA is just a special kind of graph, and if we apply standard graph drawing techniques to this graph we get something that resembles a syntax diagram. But not quite: what look like vertices in the syntax diagram (its train stations) are actually not vertices, but edges, in the NFA. And the parts of the syntax diagram where the parts of the track come together, that look like confluent junctions of multiple edges in a confluent graph drawing, are actually not junctions, but vertices, in the NFA. So with a little care in the layout and a little transformation of how we draw each kind of object, we can turn our graph into a diagram. For instance this NFA (optimized from a grammar for LISP 1.5 and drawn with some edges overlapping each other so that it looks confluent but isn't really):

turns into this syntax diagram:

With this representation in hand, we can go about optimizing the diagram to improve the visualization. There are a lot of things we could have tried, but in this paper the main ones are local optimizations to individual diagrams (such as pulling together tracks that go to the same place), replacement of tail-recursive calls of a diagram within itself to create loops, and replacement of nonterminals in one diagram by copies of the corresponding diagram, to reduce the number of different components in the visualization at the expense of making individual components more complicated.

The paper also contains some experimental testing to show that these optimizations are effective, but I don't think that's really the point. The real point of the paper is to introduce the idea of algorithmically improving the appearance of syntax diagrams, rather than translating grammars directly to diagrams, and to provide both a software pipeline and an intermediate data structure that make that improvement possible.

Unfortunately, although (unlike many of my papers) we implemented what we wrote about, I don't think our actual code is in good shape for public release. And my co-author who wrote all the code (David Brown, an undergraduate at the time) has graduated and gone on to better things. But now that we've done this research and understand better what we should have been doing in the first place, I think a second system built along the same principles would work much more smoothly and could be a very helpful visualization utility.

September 03, 2015 05:14 AM

Wes Felter


George Soros models

Mr. Soros in his books talked about principles which are not used by today's financial mathematics — namely reflexivity of all actions on the market. Simply it can be given by following: expectations of traders are based on the news and historical prices. They trade based on their expectations and hence influence prices which then influence expectations on the next "turn". I have never met an implementation of these ideas in the scientific framework. Have you? If yes, please give me a reference.

by Ilya at September 03, 2015 04:03 AM

How to create a basket of currency pairs with the lowest correlation in R?

My strategy is designed to buy and sell all assets of a universe and rebalance periodically. It goes either long or short. To limit risk exposure to a single currency I would like the assets in the universe to have low relashionship and low price correlation between them (positive or even negative as it can go short).

For example, if the universe has 3 currency pairs A, B and C, my objective is to have price correlation c1 (between A and B), c2 (between B and C) and c3 (between A and C) as low as possible. Let's say c0 is the average of c1, c2, and c3, it will give information on the "global" correlation between pairs A, B and C.

Now imagine we have 26 currency pairs (from A to Z) to consider for creating a small universe of 5 of them (3 was for the example).

The method I apply is to create every possible combination of groups of 5 currency pairs and then it calculates c0 correl$corr of every single group. I also calculate standard deviation of c1, c2 and c3 correl$stddev as it will filter out groups with low c0 and high c1, c2 and c3. Finally I sum the global correlation and the stddev in order to rank groups with a single value correl$"corr+stddev.

In this example 16 pairs are candidates.

My dataset is an xts object with historical prices of the 16 currency pairs. I could extend it to ~50 to improve the selection as I guess bigger is the pool and better it must be.

This is how I proceed to achieve this :

# Put all symbols name in a list 
pairs <- names(mydata)
# Create groups of 5 currency pairs with no duplication
group <- combn( pairs , 5 )
# Calculate number of groups
nb_group <- ncol(group)
# Create empty object to store my result
result <- NULL
# For every groups
for (i in 1:nb_group) {

  # Calculate the mean correlation for the group
  correl <- round(mean(cor(mydata[, group[,i]])),3)
  # Transform as data frame and give a name
  correl <-
  colnames(correl) <- "cor"
  rownames(correl) <- toString(group[,i])
  # Calculate stddev and the sum of correlation and stddev
  correl$stddev <- round(sd(cor(mydata[, group[,i]])),3)
      correl$"cor+sddev" <- correl$cor + correl$stddev
  # export data
  result <- rbind(correl, result)

# Basket of currency pairs with the lowest correlation and stddev

This return something like :

> head(result[order(result[,3]),])
                                          cor stddev      cor+sddev
GBPUSD, USDCAD, USDRUB, USDTRY, NZDUSD  0.032  0.583          0.615
GBPUSD, USDCHF, USDCAD, USDRUB, USDTRY  0.048  0.569          0.617
GBPUSD, USDJPY, USDRUB, USDTRY, NZDUSD  0.052  0.576          0.628
GBPUSD, USDCAD, EURCHF, USDRUB, USDTRY  0.048  0.582          0.630
GBPUSD, USDCAD, USDRUB, USDMXN, NZDUSD  0.065  0.566          0.631
GBPUSD, USDCHF, USDCAD, USDRUB, USDMXN  0.097  0.536          0.633

Result is same when I average correlation and stddev (instead of the sum)

Do you think R could help to achieve this and is there a more efficient approach to create such basket of currency pairs ?

I've checked portfolio optimization packages like tawny, PortfolioAnalytics and fPortfolio but unfortunatly I'm not familiar with financial formulas and I got lost.

Thank you, Florent

by Florent at September 03, 2015 03:53 AM


Is it possible to boost the error probability of a Consensus protocol over dynamic network? [migrated]

Consider the binary consensus problem in a synchronous setting over dynamic network (thus, there are $n$ nodes, and some of them are connected by edges that may change round to round). Given a randomized $\delta$-error protocol for Consensus in this setting, is it possible to use this protocol as a black box to obtain another Consensus protocol with smaller error?

Note that for problems with deterministic outcome (such as Sum), this can be done by repeatedly executing the protocol and taking the majority as the answer. However I am unable to figure out how for Consensus, so I wonder whether this is actually impossible to do.

--- Details ---

Dynamic Network. There are $n$ nodes on a dynamic network over synchronous setting (thus the protocol will proceed in rounds). Each node has a unique ID of size $\log(n)$. There are edges connecting pairs of nodes. There are edges connecting pairs of nodes, and this set of edges may change from round to round (but remains fixed throughout any particular round). There may be additional restrictions to the dynamic network that allows the problem to be solved by some protocol (such as having a fixed diameter), but it is irrelevant to our discussion.

The only information known to each node is their unique ID. They do not know $n$ or the topology of the network. In particular, they do not know who their neighbors are (nodes that are adjacent to them with an edge).

How nodes communicate. The only method of communication between nodes is by broadcasting messages under the $CONGEST$ model (D. Peleg, "Distributed Computing, A Locality-Sensitive Approach"). More specifically, in each round, every node may broadcast an $O(\log n)$ sized message. Then, every node will receive ALL messages that were broadcasted by all of its neighbors in that particular round in an arbitrary order. Thus, in particular, nodes may attach their IDs onto their messages, since their IDs are of size $O(\log n)$.

Binary Consensus. The binary consensus problem over this network is as follows. Each node has a binary input ($0$ or $1$), and each must decide either $0$ or $1$ under the the validity, termination, and agreement for consensus with $\delta$ error:

  • validity means that for $z \in \{0, 1\}$, if all nodes have $z$ as its initial input, then the only possible decision is $z$.

  • termination means that each node must eventually decide.

  • agreement means that all nodes must have the same decision.

  • The error rate means that under worst case input and worst case network, the probability that the protocol does not satisfy one of the three conditions is at most $\delta$ over average coin flips of the protocol's coin.

Question. For a particular set of dynamic networks, suppose we have a lower bound on the average number of rounds incurred by any randomized $\delta$-error protocol $P$ for solving binary consensus under this settings such that $\delta < \frac{1}{3}$, under worst case dynamic network, worst case set of nodes' inputs, and on average coin flips of the protocol. Does this bound automatically hold for any $\delta'$-error protocol for $\delta' < \delta$?.

by Irvan at September 03, 2015 03:43 AM

What is the best-case and worst-case Time complexity in terms of n,in the form "O(...)" of the following code? [duplicate]

This question already has an answer here:

I've Asked this question ..... What is the best-case and worst-case Time complexity in terms of n,in the form "O(...)" of the following code

int i=n,x=0; whihle(i>0) { i--; x+=A[i][n-i]; }

Can anybody answer this in details ?

by Bluee and Red at September 03, 2015 03:42 AM



Gnus - Combined inboxes?

I have just started really using Gnus for email (smtp + msmtp) for three accounts. Currently I am subscribed to the three Inboxes and they appear separately in my group buffer.

My question is - is there a way to combine incoming mail from all three accounts to appear in one group? It would make it easier to keep track of all three.

As well, if anyone can point out some tricks or tips or neat things I can do with gnus I wouldn't mind!

submitted by Mister_Bubbles
[link] [2 comments]

September 03, 2015 03:32 AM


Is the bitonic sort algorithm stable?

I was wondering, is the bitonic sort algorithm stable? I searched the original paper, wikipedia and some tutorials, could not find it.

It seems to me that it should be, as it is composed of merge / sort steps, however was unable to find answer anywhere.

The reason why I'm asking - I was comparing this particular implementation of bitonic sort to the sort implemented in the C++ standard library, for array length 9 it requires 28 comparison / swap operations, while the standard library sort (which is unstable) requires 25. The three extra cswaps do not seem enough to make the sort stable.

by the swine at September 03, 2015 02:46 AM

arXiv Networking and Internet Architecture

Energy Harvesting Transmitters that Heat Up: Throughput Maximization under Temperature Constraints. (arXiv:1509.00836v1 [cs.IT])

Motivated by damage due to heating in sensor operation, we consider the throughput optimal offline data scheduling problem in an energy harvesting transmitter such that the resulting temperature increase remains below a critical level. We model the temperature dynamics of the transmitter as a linear system and determine the optimal transmit power policy under such temperature constraints as well as energy harvesting constraints over an AWGN channel. We first derive the structural properties of the solution for the general case with multiple energy arrivals. We show that the optimal power policy is piecewise monotone decreasing with possible jumps at the energy harvesting instants. We derive analytical expressions for the optimal solution in the single energy arrival case. We show that, in the single energy arrival case, the optimal power is monotone decreasing, the resulting temperature is monotone increasing, and both remain constant after the temperature hits the critical level. We then generalize the solution for the multiple energy arrival case.

by <a href="">Omur Ozel</a>, <a href="">Sennur Ulukus</a>, <a href="">Pulkit Grover</a> at September 03, 2015 01:30 AM

Better open-world website fingerprinting. (arXiv:1509.00789v1 [cs.CR])

Website fingerprinting enables an attacker to infer the source of a web page when a client is browsing through encrypted or anonymized network connections. We present a new website fingerprinting attack based on fingerprints extracted from random decision forests. Within the context of this attack we provide an analysis of the utility of previously proposed traffic features. Our attack, k-fingerprinting, performs better than current state-of-the-art attacks even against website fingerprinting defenses. We further show that error rates vary widely between web resources, and thus some patterns of use will be predictably more vulnerable to attack than others.

by <a href="">Jamie Hayes</a>, <a href="">George Danezis</a> at September 03, 2015 01:30 AM

Interference-Assisted Wireless Energy Harvesting in Cognitive Relay Network with Multiple Primary Transceivers. (arXiv:1509.00779v1 [cs.IT])

We consider a spectrum sharing scenario, where a secondary network coexists with a primary network of multiple transceivers. The secondary network consists of an energy-constrained decode-and-forward secondary relay which assists the communication between a secondary transmitter and a destination in the presence of the interference from multiple primary transmitters. The secondary relay harvests energy from the received radio-frequency signals, which include the information signal from the secondary transmitter and the primary interference. The harvested energy is then used to decode the secondary information and forward it to the secondary destination. At the relay, we adopt a time switching policy due to its simplicity that switches between the energy harvesting and information decoding over time. Specifically, we derive a closed-form expression for the secondary outage probability under the primary outage constraint and the peak power constraint at both secondary transmitter and relay. In addition, we investigate the effect of the number of primary transceivers on the optimal energy harvesting duration that minimizes the secondary outage probability. By utilizing the primary interference as a useful energy source in the energy harvesting phase, the secondary network achieves a better outage performance.

by <a href="">Sanket S. Kalamkar</a>, <a href="">Adrish Banerjee</a> at September 03, 2015 01:30 AM

A Big Data Analyzer for Large Trace Logs. (arXiv:1509.00773v1 [cs.DC])

Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.

by <a href="">Alkida Balliu</a>, <a href="">Dennis Olivetti</a>, <a href="">Ozalp Babaoglu</a>, <a href="">Moreno Marzolla</a>, <a href="">Alina S&#xee;rbu</a> at September 03, 2015 01:30 AM

A Game-theoretic Formulation of the Homogeneous Self-Reconfiguration Problem. (arXiv:1509.00737v1 [cs.MA])

In this paper we formulate the homogeneous two- and three-dimensional self-reconfiguration problem over discrete grids as a constrained potential game. We develop a game-theoretic learning algorithm based on the Metropolis-Hastings algorithm that solves the self-reconfiguration problem in a globally optimal fashion. Both a centralized and a fully distributed algorithm are presented and we show that the only stochastically stable state is the potential function maximizer, i.e. the desired target configuration. These algorithms compute transition probabilities in such a way that even though each agent acts in a self-interested way, the overall collective goal of self-reconfiguration is achieved. Simulation results confirm the feasibility of our approach and show convergence to desired target configurations.

by <a href="">Daniel Pickem</a>, <a href="">Magnus Egerstedt</a>, <a href="">Jeff S. Shamma</a> at September 03, 2015 01:30 AM

A Multilayer Model of Computer Networks. (arXiv:1509.00721v1 [cs.NI])

The fundamental concept of applying the system methodology to network analysis declares that network architecture should take into account services and applications which this network provides and supports. This work introduces a formal model of computer networks on the basis of the hierarchical multilayer networks. In turn, individual layers are represented as multiplex networks. The concept of layered networks provides conditions of top-down consistency of the model. Next, we determined the necessary set of layers for network architecture representation with regard to applying the system methodology to network analysis.

by <a href="">Andrey A. Shchurov</a> at September 03, 2015 01:30 AM

A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data. (arXiv:1509.00693v1 [cs.DB])

The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users needs. Therefore analyzing users behaviour is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. WUM contains three main steps: preprocessing, knowledge extraction and results analysis. The goal of the preprocessing stage in Web usage mining is to transform the raw web log data into a set of user profiles. Each such profile captures a sequence or a set of URLs representing a user session.

by <a href="">Zahid Ansari</a>, <a href="">Mohammad Fazle Azeem</a>, <a href="">A. Vinaya Babu</a>, <a href="">Waseem Ahmed</a> at September 03, 2015 01:30 AM

Discovery of Web Usage Profiles Using Various Clustering Techniques. (arXiv:1509.00692v1 [cs.DB])

The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared.

by <a href="">Zahid Ansari</a>, <a href="">Waseem Ahmed</a>, <a href="">M.F. Azeem</a>, <a href="">A.Vinaya Babu</a> at September 03, 2015 01:30 AM

A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results. (arXiv:1509.00690v1 [cs.DB])

Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionized data in order to capture similar interests and trends among users navigational patterns. Since the sessionized data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality reduction is achieved by eliminating the low support URLs. Very small sessions are also removed in order to filter out the noise from the data. But direct elimination of low support URLs and small sized sessions may results in loss of a significant amount of information especially when the count of low support URLs and small sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions based on a fuzzy membership function. After assigning the weights we apply a Fuzzy c-Mean Clustering algorithm to discover the clusters of user profiles. In this paper, we describe our fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment. Finally we compare our soft computing based approach of dimensionality reduction with the traditional approach of direct elimination of small sessions and low support count URLs. Our results show that fuzzy feature evaluation and dimensionality reduction results in better performance and validity indices for the discovered clusters.

by <a href="">Zahid Ansari</a>, <a href="">M.F.Azeem</a>, <a href="">A. Vinaya Babu</a>, <a href="">Waseem Ahmed</a> at September 03, 2015 01:30 AM

On Normalized Compression Distance and Large Malware. (arXiv:1509.00689v1 [cs.CR])

Normalized Compression Distance (NCD) is a popular tool that uses compression algorithms to cluster and classify data in a wide range of applications. Existing discussions of NCD's theoretical merit rely on certain theoretical properties of compression algorithms. However, we demonstrate that many popular compression algorithms don't seem to satisfy these theoretical properties. We explore the relationship between some of these properties and file size, demonstrating that this theoretical problem is actually a practical problem for classifying malware with large file sizes, and we then introduce some variants of NCD that mitigate this problem.

by <a href="">Rebecca Schuller Borbely</a> at September 03, 2015 01:30 AM

A note on strictly positive logics and word rewriting systems. (arXiv:1509.00666v1 [math.LO])

We establish a natural translation from word rewriting systems to strictly positive polymodal logics. Thereby, the latter can be considered as a generalization of the former. As a corollary we obtain examples of undecidable strictly positive normal modal logics. The translation has its counterpart on the level of proofs: we formulate a natural deep inference proof system for strictly positive logics generalizing derivations in word rewriting systems. We also formulate some open questions related to the theory of modal companions of superintuitionistic logics that was initiated by L.L. Maximova and V.V. Rybakov.

by <a href="">Lev D. Beklemishev</a> at September 03, 2015 01:30 AM

Termination of rewrite relations on $\lambda$-terms based on Girard's notion of reducibility. (arXiv:1509.00649v1 [cs.LO])

In this paper, we show how to extend the notion of reducibility introduced by Girard for proving the termination of $\beta$-reduction in the polymorphic $\lambda$-calculus, to prove the termination of various kinds of rewrite relations on $\lambda$-terms, including rewriting modulo some equational theory and rewriting with matching modulo $\beta$$\eta$, by using the notion of computability closure. This provides a powerful termination criterion for various higher-order rewriting frameworks, including Klop's Combinatory Reductions Systems with simple types and Nipkow's Higher-order Rewrite Systems.

by <a href="">Fr&#xe9;d&#xe9;ric Blanqui</a> (DEDUCTEAM) at September 03, 2015 01:30 AM

How to Generate Security Cameras: Towards Defence Generation for Socio-Technical Systems. (arXiv:1509.00643v1 [cs.CR])

Recently security researchers have started to look into automated generation of attack trees from socio-technical system models. The obvious next step in this trend of automated risk analysis is automating the selection of security controls to treat the detected threats. However, the existing socio-technical models are too abstract to represent all security controls recommended by practitioners and standards. In this paper we propose an attack-defence model, consisting of a set of attack-defence bundles, to be generated and maintained with the socio-technical model. The attack-defence bundles can be used to synthesise attack-defence trees directly from the model to offer basic attack-defence analysis, but also they can be used to select and maintain the security controls that cannot be handled by the model itself.

by <a href="">Olga Gadyatskaya</a> at September 03, 2015 01:30 AM

Correlated Poisson processes and self-decomposable laws. (arXiv:1509.00629v1 [math.PR])

We discuss in detail a procedure to produce two Poisson processes $M(t), N(t)$ associated to positively correlated, self-decomposable, exponential renewals. The main result of this paper is a closed, elementary form for the joint distribution $p_{m,n}(s,t)$ of the pair $(M(s), N(t))$: this turns out to be instrumental to produce explicit algorithms with applications to option pricing, as well as to credit and insurance risk modeling, that will be discussed in a separate paper

by <a href="">Nicola Cufaro Petroni</a>, <a href="">Piergiacomo Sabino</a> at September 03, 2015 01:30 AM

Model Checking Epistemic Halpern-Shoham Logic Extended with Regular Expressions. (arXiv:1509.00608v1 [cs.LO])

The Epistemic Halpern-Shoham logic (EHS) is a temporal-epistemic logic that combines the interval operators of the Halpern-Shoham logic with epistemic modalities. The semantics of EHS is based on interpreted systems whose labelling function is defined on the endpoints of intervals. We show that this definition can be generalised by allowing the labelling function to be based on the whole interval by means of regular expressions. We prove that all the positive results known for EHS, notably the attractive complexity of its model checking problem for some of its fragments, still hold for its generalisation. We also propose the new logic EHSre which operates on standard Kripke structures and has expressive power equivalent to that of EHS with regular expressions. We compare the expressive power of EHSre with standard temporal logics.

by <a href="">Alessio Lomuscio</a>, <a href="">Jakub Michaliszyn</a> at September 03, 2015 01:30 AM

Backhaul-Aware Caching Placement for Wireless Networks. (arXiv:1509.00558v1 [cs.IT])

As the capacity demand of mobile applications keeps increasing, the backhaul network is becoming a bottleneck to support high quality of experience (QoE) in next-generation wireless networks. Content caching at base stations (BSs) is a promising approach to alleviate the backhaul burden and reduce user-perceived latency. In this paper, we consider a wireless caching network where all the BSs are connected to a central controller via backhaul links. In such a network, users can obtain the required data from candidate BSs if the data are pre-cached. Otherwise, the user data need to be first retrieved from the central controller to local BSs, which introduces extra delay over the backhaul. In order to reduce the download delay, the caching placement strategy needs to be optimized. We formulate such a design problem as the minimization of the average download delay over user requests, subject to the caching capacity constraint of each BS. Different from existing works, our model takes BS cooperation in the radio access into consideration and is fully aware of the propagation delay on the backhaul links. The design problem is a mixed integer programming problem and is highly complicated, and thus we relax the problem and propose a low-complexity algorithm. Simulation results will show that the proposed algorithm can effectively determine the near-optimal caching placement and provide significant performance gains over conventional caching placement strategies.

by <a href="">Xi Peng</a>, <a href="">Juei-Chin Shen</a>, <a href="">Jun Zhang</a>, <a href="">Khaled B. Letaief</a> at September 03, 2015 01:30 AM

Achieving Energy Efficiency for Altruistic DISH: Three Properties. (arXiv:1509.00554v1 [cs.NI])

In an altruistic DISH protocol, additional nodes called "altruists" are deployed in a multi-channel ad hoc network to achieve energy efficiency while still maintaining the original throughput-delay performance. The responsibility of altruists is to constantly monitor the control channel and awaken other (normal) nodes when necessary (to perform data transmissions). Altruists never sleep while other nodes sleep as far as possible. This technical report proves three properties related to this cooperative protocol. The first is the conditions for forming an unsafe pair (UP) in an undirected graph. The second is the necessary and sufficient conditions for full cooperation coverage to achieve the void of multi-channel coordination (MCC) problems. The last is the NP-hardness of determining the minimum number and locations of altruistic nodes to achieve full cooperation coverage.

by <a href="">Tony T. Luo</a> at September 03, 2015 01:30 AM

Disaster-Resilient Control Plane Design and Mapping in Software-Defined Networks. (arXiv:1509.00509v1 [cs.NI])

Communication networks, such as core optical networks, heavily depend on their physical infrastructure, and hence they are vulnerable to man-made disasters, such as Electromagnetic Pulse (EMP) or Weapons of Mass Destruction (WMD) attacks, as well as to natural disasters. Large-scale disasters may cause huge data loss and connectivity disruption in these networks. As our dependence on network services increases, the need for novel survivability methods to mitigate the effects of disasters on communication networks becomes a major concern. Software-Defined Networking (SDN), by centralizing control logic and separating it from physical equipment, facilitates network programmability and opens up new ways to design disaster-resilient networks. On the other hand, to fully exploit the potential of SDN, along with data-plane survivability, we also need to design the control plane to be resilient enough to survive network failures caused by disasters. Several distributed SDN controller architectures have been proposed to mitigate the risks of overload and failure, but they are optimized for limited faults without addressing the extent of large-scale disaster failures. For disaster resiliency of the control plane, we propose to design it as a virtual network, which can be solved using Virtual Network Mapping techniques. We select appropriate mapping of the controllers over the physical network such that the connectivity among the controllers (controller-to-controller) and between the switches to the controllers (switch-to-controllers) is not compromised by physical infrastructure failures caused by disasters. We formally model this disaster-aware control-plane design and mapping problem, and demonstrate a significant reduction in the disruption of controller-to-controller and switch-to-controller communication channels using our approach.

by <a href="">S. Sedef Savas</a>, <a href="">Massimo Tornatore</a>, <a href="">M. Farhan Habib</a>, <a href="">Pulak Chowdhury</a>, <a href="">Biswanath Mukherjee</a> at September 03, 2015 01:30 AM

Cooperation under Incomplete Information on the Discount Factors. (arXiv:1012.3117v2 [cs.GT] UPDATED)

In the repeated Prisoner's Dilemma, when every player has a different discount factor, the grim-trigger strategy is an equilibrium if and only if the discount factor of each player is higher than some threshold. What happens if the players have incomplete information regarding the discount factors? In this work we look at repeated games in which each player has incomplete information regarding the other player's discount factor, and ask when a pair of grim-trigger strategies is an equilibrium. We provide necessary and sufficient conditions for such strategies to be an equilibrium. We characterize the states of the world in which the strategies are not triggered, i.e., the players cooperate, in such equilibria (or $\epsilon$-equilibria), and ask whether these "cooperation events" are close to those in the complete information case, when the information is "almost" complete, in several senses.

by <a href="">Cy Maor</a> at September 03, 2015 01:30 AM


Is it possible to edit or search an inactive git branch in emacs?

To what degree is it possible to interact with a git branch that isn't the current one from within Emacs? I have a large software project that takes a long time to build and would like to be able to view, work on, or inspect other branches while the build process is happening. I also sometimes forget what branches I have put things in.

submitted by small-wolf
[link] [1 comment]

September 03, 2015 01:04 AM




ANTLR Help (String Literals)

I don't know hardly anything about ANTLR or the grammars involved, however I did find a grammar already for the language I am interested in (PeopleCode, propriety language used by Oracle for PeopleTools).

This grammar is reasonably well done but I've noticed an issue with the StringLiteral rule for the lexer:

StringLiteral : '"' ( ~'"' )* '"' ;

Seems to say "starts with a quote, then 0 to many of anything that isn't a quote, and finally ends with a quote". In most languages this would be accurate however PeopleCode is bizzare.

To "escape" a quote within a string, you use double quotes. Example:

local string &test = "Hello ""World"""

Would produce a string (Hello "World"), without the parenthesis of course.

I'm not sure how to adjust this rule to allow for double quotes within the string and was hoping someone better at lexers/antlr would be willing and able to help :)


submitted by tslater2006
[link] [comment]

September 03, 2015 12:58 AM


labeling high frequency signal data

Was curious if anyone has methodologies they can recommend for systematically labeling (discrete) signals generated from intraday tick data for use in classification or detection models ?

by jharonfe at September 03, 2015 12:53 AM

Planet Theory

Recognizing Weighted Disk Contact Graphs

Authors: Boris Klemz, Martin Nöllenburg, Roman Prutkin
Download: PDF
Abstract: Disk contact representations realize graphs by mapping vertices bijectively to interior-disjoint disks in the plane such that two disks touch each other if and only if the corresponding vertices are adjacent in the graph. Deciding whether a vertex-weighted planar graph can be realized such that the disks' radii coincide with the vertex weights is known to be NP-hard. In this work, we reduce the gap between hardness and tractability by analyzing the problem for special graph classes. We show that it remains NP-hard for outerplanar graphs with unit weights and for stars with arbitrary weights, strengthening the previous hardness results. On the positive side, we present constructive linear-time recognition algorithms for caterpillars with unit weights and for embedded stars with arbitrary weights.

September 03, 2015 12:45 AM

Combinatorial Properties of Triangle-Free Rectangle Arrangements and the Squarability Problem

Authors: Jonathan Klawitter, Martin Nöllenburg, Torsten Ueckerdt
Download: PDF
Abstract: We consider arrangements of axis-aligned rectangles in the plane. A geometric arrangement specifies the coordinates of all rectangles, while a combinatorial arrangement specifies only the respective intersection type in which each pair of rectangles intersects. First, we investigate combinatorial contact arrangements, i.e., arrangements of interior-disjoint rectangles, with a triangle-free intersection graph. We show that such rectangle arrangements are in bijection with the 4-orientations of an underlying planar multigraph and prove that there is a corresponding geometric rectangle contact arrangement. Moreover, we prove that every triangle-free planar graph is the contact graph of such an arrangement. Secondly, we introduce the question whether a given rectangle arrangement has a combinatorially equivalent square arrangement. In addition to some necessary conditions and counterexamples, we show that rectangle arrangements pierced by a horizontal line are squarable under certain sufficient conditions.

September 03, 2015 12:45 AM

Gaussian random projections for Euclidean membership problems

Authors: Ky Vu, Pierre-Louis Poirion, Leo Liberti
Download: PDF
Abstract: We discuss the application of random projections to the fundamental problem of deciding whether a given point in a Euclidean space belongs to a given set. We show that, under a number of different assumptions, the feasibility and infeasibility of this problem are preserved with high probability when the problem data is projected to a lower dimensional space. Our results are applicable to any algorithmic setting which needs to solve Euclidean membership problems in a high-dimensional space.

September 03, 2015 12:44 AM

Extremal Distances for Subtree Transfer Operations in Binary Trees

Authors: Ross Atkins, Colin McDiarmid
Download: PDF
Abstract: Three standard subtree transfer operations for binary trees, used in particular for phylogenetic trees, are: tree bisection and reconnection ($TBR$), subtree prune and regraft ($SPR$) and rooted subtree prune and regraft ($rSPR$). For a pair of leaf-labelled binary trees with $n$ leaves, the maximum number of such moves required to transform one into the other is $n-\Theta(\sqrt{n})$, extending a result of Ding, Grunewald and Humphries. We show that if the pair is chosen uniformly at random, then the expected number of moves required to transfer one into the other is $n-\Theta(n^{2/3})$. These results may be phrased in terms of agreement forests: we also give extensions for more than two binary trees.

September 03, 2015 12:43 AM

Variants of Plane Diameter Completion

Authors: Petr A. Golovach, Clément Requilé, Dimitrios M. Thilikos
Download: PDF
Abstract: The {\sc Plane Diameter Completion} problem asks, given a plane graph $G$ and a positive integer $d$, if it is a spanning subgraph of a plane graph $H$ that has diameter at most $d$. We examine two variants of this problem where the input comes with another parameter $k$. In the first variant, called BPDC, $k$ upper bounds the total number of edges to be added and in the second, called BFPDC, $k$ upper bounds the number of additional edges per face. We prove that both problems are {\sf NP}-complete, the first even for 3-connected graphs of face-degree at most 4 and the second even when $k=1$ on 3-connected graphs of face-degree at most 5. In this paper we give parameterized algorithms for both problems that run in $O(n^{3})+2^{2^{O((kd)^2\log d)}}\cdot n$ steps.

September 03, 2015 12:42 AM

A note on Probably Certifiably Correct algorithms

Authors: Afonso S. Bandeira
Download: PDF
Abstract: Many optimization problems of interest are known to be intractable, and while there are often heuristics that are known to work on typical instances, it is usually not easy to determine a posteriori whether the optimal solution was found. In this short note, we discuss algorithms that not only solve the problem on typical instances, but also provide a posteriori certificates of optimality, probably certifiably correct (PCC) algorithms. As an illustrative example, we present a fast PCC algorithm for minimum bisection under the stochastic block model and briefly discuss other examples.

September 03, 2015 12:42 AM

Finding Near-Optimal Independent Sets at Scale

Authors: Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, Renato F. Werneck
Download: PDF
Abstract: The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space---which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.

September 03, 2015 12:40 AM

Energy randomness

Authors: Joseph S. Miller, Jason Rute
Download: PDF
Abstract: Energy randomness is a notion of partial randomness introduced by Diamondstone and Kjos-Hanssen to characterize the sequences that can be elements of a Martin-L\"of random closed set (in the sense of Barmpalias, Brodhead, Cenzer, Dashti, and Weber). It has also been applied by Allen, Bienvenu, and Slaman to the characterization of the possible zero times of a Martin-L\"of random Brownian motion. In this paper, we show that $X \in 2^\omega$ is $s$-energy random if and only if $\sum_{n\in\omega} 2^{sn - KM(X\upharpoonright n)} < \infty$, providing a characterization of energy randomness via a priori complexity $KM$. This is related to a question of Allen, Bienvenu, and Slaman.

September 03, 2015 12:40 AM


Quantitative Finance Programming Language

Since couple of weeks, I started to do my research on quant finance. During this time, I could discover a lot of stuff and with that stuff, a lot of questions came to my mind. A lot of news or economic journals/magazines write about HFT/Algorithmic trading. Most of them say that the companies or software developers prefer to use C++. In some articles, the writers talk about Java, C#, C or even ASM. I tried to find the reason for C++, but I weren't successful. This topic doesn't provide answers that I need (Why is C++ still a very popular language in quantitative finance?)

Here are my questions: 1. Why C++? That some companies might use ASM (and I can only imagine ASM in HFT where milliseconds play a role), that's fine. But in medium frequency trading or in algorithms? Is it because of speed? I looked for quant finance libraries for C++, but I couldn't find a lot. The only thing is QuantLib, MatLib and TA-Lib. But no chart APIs/Libs or tutorials. Seems like no one doing tutorials. 2. Why do some people choose Java? I know, Java is very popular language and has a lot of APIs/Libs and the community is growing. But if the speed might play a role, then Java can't be the fastest (because of virtual environment). Or am I wrong? 3. Why no one is using Python for medium frequency trading or algo trading? Python has a lot of Apis/Libs like MatLib, TA-Lib, Pyqtgraph. Ok, I have to say, Python is not the fastest. 4. In this discussion Why is C++ still a very popular language in quantitative finance?, some people claim that C# might be much better for quant finance developing. Is it really true? How about Libs, APIs, Tutorials etc?

And my final question, what is the important property for choosing a language for quant finance? I don't talk about ASM because it's the fastest language and it's used for very complex calculations that have to be made fast. But what's about C++, C#, Python and Java? For me, it's important that there should be Libs and Tutorials/Examples. And the beginning, I started with python, but after all I have read, I'm not sure about Python anymore.

by Speakard at September 03, 2015 12:34 AM



Understanding the one-writer, many-readers problem?

I need some help understanding the implementation of the one-writer many-readers problem shown below as I am new to this concept:

enter image description here

I have a rough idea that there will be starvation/deadlock among the readers as there is one writer that multiple readers are trying to access at once. Can anyone please provide some further insight regarding the problems with the implementation of this situation?

I am not looking for any coding, that was just the sample code (pseudocode) I am working with. I am looking for the problem of the implementation. Upon further research I have assumed the following (based on the line: if readcount == 1 then semWait(wrt); :

  • There is one writer and many readers
  • The readers have priority over the writers
  • The possible problem: The writers cannot write until the reader has finished reading therefore starvation occurs

However upon re-evaluation of the code I have also assumed the following based on the lines:

readcount := readcount - 1;
if readcount == 0 then up

Could I not also say that only one reader may read at a time therefore the other readers will be starved?

Therefore would either of these be the correct way of interpreting the coding to the problem is of the implementation or would I be wrong?

by Osiris93 at September 03, 2015 12:09 AM


HN Daily

Planet Theory

L-Drawings of Directed Graphs

Authors: Patrizio Angelini, Giordano Da Lozzo, Marco Di Bartolomeo, Valentino Di Donato, Maurizio Patrignani, Vincenzo Roselli, Ioannis G. Tollis
Download: PDF
Abstract: We introduce L-drawings, a novel paradigm for representing directed graphs aiming at combining the readability features of orthogonal drawings with the expressive power of matrix representations. In an L-drawing, vertices have exclusive $x$- and $y$-coordinates and edges consist of two segments, one exiting the source vertically and one entering the destination horizontally.

We study the problem of computing L-drawings using minimum ink. We prove its NP-completeness and provide a heuristics based on a polynomial-time algorithm that adds a vertex to a drawing using the minimum additional ink. We performed an experimental analysis of the heuristics which confirms its effectiveness.

September 03, 2015 12:00 AM

September 02, 2015


Converting this DFA to RE using the Brzozowski Algebraic Method [on hold]

Hello people, I'm trying to convert this DFA to RE using the Brzozowski Algebraic Method.

(DFA created with JFlap) DFA to be converted to RE

My Attempt

Write down equations

R1 = bR1 + aR2 + ɛ
R2 = bR2 + aR3
R3 = bR2 + aR1

Eliminate equal Rx's from RHS

R1 = b*aR2 + b*
R2 = b*aR3

New equations

R1 = b*aR2 + b*
R2 = b*aR3
R3 = bR2 + aR1

Substitute R2 into R3 and eliminate R3 from RHS

R3 = bb*aR3 + aR1
R3 = (bb*a)*aR1

Substitute R3 into R2

R2 = b*a(bb*a)*aR1

New equations

R1 = b*aR2 + b*
R2 = b*a(bb*a)*aR1
R3 = (bb*a)*aR1

Substitute R2 into R1 and eliminate R1 from RHS

R1 = b*ab*a(bb*a)*aR1 + b*
R1 = (b*ab*a(bb*a)*a)*b*


RE = (b*ab*a(bb*a)*a)*b*


The past exam paper answer is: b*(ab*a(bb*a)*ab*)*

My calculated answer is: (b*ab*a(bb*a)*a)*b*

Are these 2 answers equal or have I done something wrong?

EDIT: The answer is correct even in the form I have provided; the textbook answer has gone a step forward having used the shifting rule to achieve its format.

Here are some useful rules for simplifying RE's:

  • Shifting Rule: E(FE)* = (EF)*E
  • Denesting Rule: (E*F)*E* = (E+F)*

by gsymeoni at September 02, 2015 11:52 PM

Automated optimization of 0-1 matrix vector multiplication


Is there established procedure or theory for generating code that efficiently applies a matrix-vector multiplication, when the matrix is dense and filled with only zeros and ones? Ideally, the optimized code would make systematic use of previously computed information to reduce duplicated work.

In other words, I have a matrix $M$ and I want to do some precomputation based upon $M$, that will make computing $Mv$ as efficient as possible when I later receive the vector $v$.

$M$ is a rectangular dense binary matrix that is known at "compile time", whereas $v$ is an unknown real vector that is only known at "run time".

Example 1: (sliding window)

Let me use an easy small example to illustrate my point. Consider the matrix, $$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1\\ & 1 & 1 & 1 & 1 & 1 \\ & & 1 & 1 & 1 & 1 & 1\\ & & & 1 & 1 & 1 & 1 & 1\end{bmatrix}.$$ Supposing we apply this matrix to a vector $v$ to get $w = Mv$. Then the entries of the result are, \begin{align} w_1 &= v_1 + v_2 + v_3 + v_4 + v_5\\ w_2 &= v_2 + v_3 + v_4 + v_5 + v_6\\ w_3 &= v_3 + v_4 + v_5 + v_6 + v_7\\ w_4 &= v_4 + v_5 + v_6 + v_7 + v_8 \end{align}

Doing a standard matrix-vector multiplication will compute exactly this way. However, a lot of this work is redundant. We could do the same matrix computation at less cost by keeping track of a "running total", and adding/subtracting to get the next number: \begin{align} w_1 &= v_1 + v_2 + v_3 + v_4 + v_5\\ w_2 &= w_1 + v_6 - v_1\\ w_3 &= w_2 + v_7 - v_2\\ w_4 &= w_3 + v_8 - v_3 \end{align}

Example 2: (hierarchical structure)

In the previous example, we could just keep track of a running total. However, usually one would need to create and store a tree of intermediate results. For example, consider $$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 \\ & & & & 1 & 1 & 1 & 1\\ 1 & 1 \\ & & & & 1 & 1 \\ & & 1 & 1 \\ & & & & & & 1 & 1\end{bmatrix}$$ One could compute $w = Mv$ efficiently using a tree of intermediate results:

  1. Compute $w_5$ and $w_7$, and add them to get $w_3$.
  2. Compute $w_4$ and $w_6$, and add them to get $w_2$.
  3. Add $w_2$ and $w_3$ to get $w_1$

The structure in the examples above is easy to see, but for the actual matrices I'm interested in, the structure is not so simple.

Example 3: (low rank)

To clear up some confusion, the matrices are generally not sparse. Specifically, a method solving this problem needs to be able to find efficient methods to apply matrices where large blocks are filled with ones. For example, consider

$$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & & \\ 1 & 1 & 1 & 1 & & \\ 1 & 1 & 1 & 1 & & \end{bmatrix}.$$

This matrix can be decomposed as a difference of two rank-1 matrices, $$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1\end{bmatrix} - \begin{bmatrix}& & & & & \\ & & & & & \\ & & & & 1 & 1 \\ & & & & 1 & 1 \\ & & & & 1 & 1\end{bmatrix}$$

so its action on a vector $w := Mv$ can be computed efficiently by, \begin{align} w_1 &= v_1 + v_2 + v_3 + v_4 + v_5 + v_6 \\ w_2 &= w_1 \\ w_3 &= w_2 - v_5 - v_6 \\ w_4 &= w_3 \\ w_5 &= w_4. \end{align}


I'm working on a numerical method for some image processing, and there are several large dense $0-1$ matrices with different structures that are fixed for all time. Later these matrices will need to be applied to many unknown vectors $v_i$ that will depend on the user's input. Right now I'm using pencil-and-paper to come up with efficient code on a per-matrix basis, but I'm wondering if the process can be automated.

by Nick Alger at September 02, 2015 11:15 PM


Simulate stock prices following t distribution in matlab [on hold]

Trying to simulate different price paths for a stock following the t distribution. Can anyone provide code?

by PDouble1 at September 02, 2015 11:01 PM

number of trades - flaw in White Reality Check?

I went through Whites paper of the reality check for multiple strategy testing. To summarize at a simple example:

I have 2 strategies, s1 and s2. s1 gives 2 signals and therefore 2 returns, s2 gives 10 signals and therefore 10 returns.

Here is the Reality Check procedure:

I compare the means of the returns to some benchmark., e.g., mean(returns) = zero. To get p-values of H0, strat returns are zero, I do a Bootstrap of both strat returns, calculate both means, take the max and repeat this prozess a high number of times. I call this array BSvals. From those values I get the p-value of H0(s1) and H0(s2) being the fraction of BSvals values, that is > mean return .

However, In this method I see a flaw. Because of the lower number of signals in s1 the variance of the bootstrapped mean of s1 will be much higher than in s2. Therefore s1 will push pvalues of H0 much higher then if s1 would have signaled more often. Therefore the mean of returns as a performance statistic might not be suitable when comparing strategies with different number of trades/signals. I think of using t statistic as a unbiased performance measure to compensate for different signal numbers, being t = mean(strat returns) * squareroot(number returns) / stddev

How is your thoughts about this? Strat returns being not normal distributed squareroot(number returns) might not be an appropriate scaling factor, though...

by user3276418 at September 02, 2015 11:01 PM




Forecasting using GARCH in R

I am using the predict and ugarchforecast functions in R. When I fit my models and try to forecast, I get either only increasing or decreasing values for sigma, does anyone know why?

Thank you


eGARCHfit2 = ugarchspec(variance.model=list(model="eGARCH", garchOrder=c(1,1)), mean.model=list(armaOrder=c(0,0), include.mean=TRUE), distribution.model="norm") eGARCH2 <- ugarchfit(brentlog2, spec=eGARCHfit2)

ugarchforecast(eGARCH2, data =brentlog2, n.ahead = 21)
* GARCH Model Forecast * ------------------------------------
* Model: eGARCH
* Horizon: 21 Roll Steps: 0 Out of Sample: 0 0-roll

forecast [T0=1976-06-26 01:00:00]:
Series Sigma
T+1 0.0002619 0.008350
T+2 0.0002619 0.008387
T+3 0.0002619 0.008423
T+4 0.0002619 0.008459
T+5 0.0002619 0.008496
T+6 0.0002619 0.008532
T+7 0.0002619 0.008569
T+8 0.0002619 0.008605
T+9 0.0002619 0.008642
T+10 0.0002619 0.008678

The first value is the mean which is always constant and the second one is sigma which is always increasing as you can see.

by user3384794 at September 02, 2015 10:43 PM


What are you reading this week?

I’ve missed this recurring thread, so I thought I would take initiative and post it myself.

This week I finally finished Peter Matthiessen’s The Snow Leopard. It is especially relevant to me as one of my friends has just moved to Nepal to teach English as a second language, though not in the same region.

As a replacement for The Snow Leopard, I’ve made some headway into Hermann Melville’s Moby-Dick. Very atmospheric! I regret that I wasn’t forced to read it in grade school.

I am also tackling The New Nietzsche, although it’s slow going. I have yet to finish the first essay: “Nietzsche and Metaphysical Language,” by Michel Haar. What I admire most in Nietzsche is how he provides a way out of traditional Western philosophy by completely exploding the systems of his time. I haven’t read enough to judge Nietzsche’s conclusions, but I definitely appreciate his beginnings.

The most computer-sciency thing I am reading at the moment is Why Philosophers Should Care About Computational Complexity by Scott Aaronson. I can’t yet offer any insight on the paper, but I would love to see more work done at the intersection of CS and philosophy, which to me seems under-serviced.

by ChadSki at September 02, 2015 10:27 PM


Ant + no IDE. How can I speed up compile times?

I don't use an IDE, and I'm currently using Ant/fsc for my builds. I'm looking into speeding up my compile times. I've started by getting rid of implicits, but have not seen a big change from that.

What would you recommend as the first "low hanging fruit" that I should explore?

submitted by three-cups
[link] [41 comments]

September 02, 2015 10:24 PM


How fast is QuickFix ?

In my firm we are beginning a new OMS (Order Management System) project and there is a debate whether we use Quickfix or we go for a professional fix engine? Because there is a common doubt that QuickFix is not enough fast and obviously we will not get any technical support. I heard that in BOVESPA it has been used for a while. They are changing it with a paid one now. Well that is enough for me. If they use it I can use it.

Should I choose a professional one over QuickFix? Is it not good enough?

by ali_bahoo at September 02, 2015 10:19 PM

Isolating single assets standard deviation in a portfolio accounting for correlation

I am running a simple Monte Carlo analysis in Excel using mean return, standard deviation and the =NORMINV(RAND(),mean,std dev) method. I have a correlation matrix that I use to compute the portfolio variance for varying weights of 16 assets.

I would like to compute the standard deviation of a single asset after accounting for correlations with the portfolio. For example using the data in the attached picture, if I wanted to run a Monte Carlo analysis on asset 1 after accounting for the correlations with assets 2 and 3 in portfolio 1, what would me steps be? The mean return would simply be the weight of the asset in the portfolio times the assets return but what would the standard deviation of the asset be, the standard deviation of the portfolio times the weight of the asset?


by CuriousGeorge at September 02, 2015 10:17 PM



How to convert valueForKeyPath in swift

My old function in objective c is

+ (NSUInteger)getNumberOfDistinctUsers:(NSArray *)users {
NSArray* usersAfterPredicate = [users valueForKeyPath:@"@distinctUnionOfObjects.userName"];
return [usersAfterPredicate count]; }

How do I convert this in swift, I was trying to something like this but its crashing "Could not cast value of type 'Swift.Array'to 'Swift.AnyObject'"

static func getNumberOfDistinctUsers(users: [ICEPKReferenceDataUser]) -> Int {

    var retval : Int = 0
    if let usersAfterPredicate = (users as! AnyObject).valueForKeyPath("@distinctUnionOfObjects.userName") {
        retval = usersAfterPredicate.count
    return retval

Can I solve this problem using filter, map or Reduce? I am just trying to find out distint users in users array using the property username.

Edit* brute force way

    static func getNumberOfDistinctUsers(users: [ICEPKReferenceDataUser]) -> Int {

    var retvalSet : Set<String> = []
    for user in users {
    return retvalSet.count

by AAV at September 02, 2015 09:51 PM



Planet FreeBSD

New Release Schedule for PC-BSD

 The PC-BSD team has always been dedicated to bringing you the best graphical BSD desktop possible. We received some great feedback after our last release cycle that made us rethink our release schedule.
 In the past, we tracked FreeBSD major releases, and also added our own quarterly updates that tended to add in a good  bit of code for new features and utilities.  Going forward, PC-BSD releases will track FreeBSD releases only, such as 10.2 -> 11.0 -> 11.1. Once the code base is frozen for a major release, an update can be pushed out to EDGE users who wish to act as advanced users and beta testers for the updates.  During that several week testing period, if something goes wrong we'll count on EDGE users to help report issues so that we can
quickly get those bugs fixed during the code freeze. After the several week testing period, we can release the update for PRODUCTION users, once we are confident that the kinks are worked out and EDGE users are happy.

 We're also changing the way the EDGE and PRODUCTION branches work a little bit. EDGE packages will now only be built with the 'stable' branch of PC-BSD code, to avoid radical changes that could break functionality to the PC-BSD tool-chain. This also allows us to focus our QA and testing on the new 3rd party packages themselves.
 However, all packages that are built on FreeBSD -CURRENT will include PC-BSD's 'master' branch, and are considered bleeding edge. These images will continue to be rolled monthly and are only intended for advanced users or developers who can debug and otherwise help fix issues as they arise.
 The PRODUCTION package branch will be switching to a monthly update schedule instead of quarterly. The PC-BSD tool-chain will also be based upon "stable", which will not radically change between releases. We are hoping this change will balance between the need for stability and bringing in the latest packages / security updates to end users in a timely manner.

by Josh Smith at September 02, 2015 09:14 PM



getdigital hat eine Abmahnung von Getty Images für ...

getdigital hat eine Abmahnung von Getty Images für den Socially Awkward Pengiun erhalten, ein bekanntes Mem.

Die haben da anscheinend einen Ärger-Magneten im Keller bei getdigital, das ist ja nicht das erste Mal, dass die von irgendwelchen Leuten wegen angeblicher Urheberrechtsprobleme abgemahnt werden. Meine Güte, ey.

Der Clou an der Sache ist aber, dass Getty denen ein Angebot gemacht hat, das ohne Anwaltsgebühren zu regeln, gegen "nur" 785,40 Euro. Allerdings gilt das Angebot nur dann, wenn sie Stillschweigen bewahren. Vermutlich um Getty nicht die ewigen Jagdgründe mit einer Warnung zu verschmutzen. An dieser Stelle daher mal wieder einen herzlichen Dank an getdigital, die für solche Schweinereien nicht zu haben sind und das lieber öffentlich machen und dafür Anwaltsärger in Kauf nehmen.

September 02, 2015 09:00 PM


How do you calculate when effective access time is greater than cache access time?

I'm having trouble understanding how to calculate effective access time, hit ratios and cache access times. I'm unfamiliar with the concepts and would love help, or an explanation on how to solve this problem:

Consider a memory system with a main memory of 16 MB and a cache memory of 8,192 bytes. The cache cost is 0.02 cents/bit while the memory cost is 0.001 cents /bit. The cache access time is 50 nanoseconds whereas the memory access time is 400 nanoseconds. If the cache hit ratio is 90% what percent is the effective access time greater than the cache access time? What is the average cost per byte of memory for the system?

Any help is truly appreciated.

by coracora at September 02, 2015 08:46 PM


Zipping streams using JDK8 with lambda (

In JDK 8 with lambda b93 there was a class in b93 which could be used to zip streams (this is illustrated in the tutorial Exploring Java8 Lambdas. Part 1 by Dhananjay Nene). This function :

Creates a lazy and sequential combined Stream whose elements are the result of combining the elements of two streams.

However in b98 this has disappeared. Infact the Streams class is not even accessible in in b98.

Has this functionality been moved, and if so how do I zip streams concisely using b98?

The application I have in mind is in this java implementation of Shen, where I replaced the zip functionality in the

  • static <T> boolean every(Collection<T> c1, Collection<T> c2, BiPredicate<T, T> pred)
  • static <T> T find(Collection<T> c1, Collection<T> c2, BiPredicate<T, T> pred)

functions with rather verbose code (which doesn't use functionality from b98).

by artella at September 02, 2015 08:45 PM


Amazon S3 Update – CloudTrail Integration

You can now use AWS CloudTrail to track bucket-level operations on your Amazon Simple Storage Service (S3) buckets.  The tracked operations include creation and deletion of buckets, modifications to access controls, changes to lifecycle policies, and changes to cross-region replication settings.

AWS CloudTrail records API activity in your AWS account and delivers the resulting log files to a designated S3 bucket. You can  look up API activity related to creating, deleting and modifying your S3 resources using the CloudTrail Console, including access to 7 days of historical data. You can also create Amazon CloudWatch Alarms to look for specific API activities and receive email notifications when they occur.

Effective today we are now logging actions on S3 buckets to CloudTrail in all AWS Regions. If you have already enabled CloudTrail, you do not need to take any further action in order to take advantage of this new feature.  If you are not using CloudTrail, you can turn it on with a couple of clicks (read my introductory post – AWS CloudTrail – Capture API Activity) to learn more.

You can use the log files in many different ways. For example, you can use them as supporting evidence if you need to demonstrate compliance with internal or external policies. Let’s say that you store some important files in an S3 bucket. You can set up a CloudWatch Alarm that will fire if someone else in your organization makes changes to the bucket’s access control policy. This will allow you to verify that the change is in compliance with your policies and to take immediate corrective action if necessary.

You can also monitor creation and deletion of buckets, updates to life cycle policies, and changes to the cross-region replication settings.


by Jeff Barr at September 02, 2015 08:42 PM


Global relabeling heuristic: Push-relabel maxflow

I have a correct, working implementation of the preflow-push-relabel maxflow algorithm [2]. I am trying to implement the global relabeling update heuristic [3], but have run into some issues.

I have a specific instance of the problem here to illustrate my questions 1: enter image description here

a) For this problem instance, the "current preflow" is a state that is reached by my implemented max flow algorithm [[ even without global relabeling heuristic we reach this state ]]. Without applying any heuristics the algorithm proceeds to completion from this state giving the correct result.

b) The distance labels (in green) at this state represent a valid labeling as for every (u, v) \belong to E_{residual} d_u <= d_v + 1

Q1: The distance label (or height) is supposed to be a lower bound on the distance from the sink. However for several nodes in the residual graph (eg: 6, 7) this is not true. Eg. Node 7 has a distance label of 14...but clearly has a distance of 1 from the sink in the residual graph.

Q2: On running the global relabel at this stage, we get a labeling as seen in the extreme right. From this point on (depending on how frequently you do the global update), the algorithm can get stuck [[ deadlock ]] -- eg. nodes 6, 3 keep circulating flow between them as they each get relabeled to a higher height. If you run the global update before the reach the final height, you will reset the heights and the process repeats.

I am reasonably sure I am making a very trivial error but am unable to put my finger on it. Can someone help me with this issue? I am happy to provide a code snippet if that would help.

A couple more points that may be relevant:
-> I am implementing a single phase version of the push-relabel algorithm (ie. do not stop when you have a min. cut, but continue till you obtain a valid flow).
-> I am processing active vertices in FIFO order in the current problem instance. But I have a highest label first [3] implementation as well. I do not think this affects the two questions I have.

Thank You!

[[ This question has also been posted on programmers stackexchange -- I was unsure which forum is better for this question ]]

[2] A new approach to the maximum flow problem; A.Goldberg, R.Tarjan; JACM, Vol 35. Iss 4, 1988

[3] On implementing the push-relabel method for the maximum flow problem; B.V.Cherkassky, A.Goldberg

by user26203 at September 02, 2015 08:22 PM


Introducing the AWS SDK for C++

My colleague Jonathan Henson sent me a guest post to introduce a brand-new AWS SDK!

— Jeff;

After a long effort, we are proud to announce an open-source C++ SDK for scaling your native applications with Amazon Web Services.

The AWS SDK for C++ is a modern C++ interface with lightweight dependencies. We designed it to be fully functioning, with both low-level and high-level interfaces. However, we also wanted it to have as few dependencies as possible and to be platform-independent. At the moment, it includes support for Windows, OS X, Linux, and mobile platforms.

This SDK has been specifically designed with game developers in mind, but we have also worked hard to maintain an interface that will work for systems engineering tasks, as well as other projects that simply need the efficiency of native code.


  • Works with the Standard Template Library (STL).
  • Custom memory management support.
  • C++ 11 features used and supported.
  • Builds with CMake and your native compiler toolchain.
  • Lightweight dependencies.
  • Exception-safe.
  • Extensive, configurable logging.
  • Default credentials providers.
  • Identity management through Amazon Cognito Identity.
  • High-level Amazon S3 interface through TransferClient.
  • Uses native OS APIs for cryptographic and HTTP support.

Code Samples

Storing a value in a Amazon DynamoDB table:

Aws::DynamoDB::DynamoDBClient dynamoDbClient;
PutItemRequest putItemRequest;
AttributeValue hashKeyAttribute;
putItemRequest.AddItem("HashKey", hashKeyAttribute);
AttributeValue valueAttribute;
putItemRequest.AddItem("Value", valueAttribute);
auto putItemOutcome = dynamoDbClient.PutItem(putItemRequest);

    std::cout << "PutItem Success Using IOPS " << putItemOutcome.GetResult().GetConsumedCapacity();
    std::cout << "PutItem failed with error " << putItemOutcome.GetError().GetMessage();

Downloading a file from Amazon Simple Storage Service (S3):

Aws::S3::S3Client s3Client;
GetObjectRequest getObjectRequest;
        return Aws::New(ALLOCATION_TAG, DOWNLOADED_FILENAME, std::ios_base::out | std::ios_base::in | std::ios_base::trunc);
auto getObjectOutcome = s3Client.GetObject(getObjectRequest);
    std::cout << "File downloaded from S3 to location " << DOWNLOADED_FILENAME;
    std::cout << "File download failed from s3 with error " << getObjectOutcome.GetError().GetMessage();

It’s that simple! Download the code from GitHub today and start scaling your C++ applications with the power of Amazon Web Services.

We are launching the AWS SDK for C++ in its current experimental state while we gather feedback from users and the open source community to harden the APIs. We also are adding support for individual services as we become more confident that the client generator can properly support each protocol. Support for more services will be coming in the near future. We invite our customers to follow along with our progress and join the development efforts by submitting pull requests and sending us feedback and ideas via GitHub Issues.

Jonathan Henson, Software Development Engineer (SDE)

by Jeff Barr at September 02, 2015 08:09 PM



Memory storage capacity of Bienenstock-Cooper-Munro rule

I would like to know the memory storage capacity of the BCM learning rule when it is implemented on a Hopfield network. I understand that it will be a function of n where n is the number of neurons.

by Omar Shehab at September 02, 2015 07:38 PM

Whats Wrong with this LL(1) Grammar?

I am trying to build a LL(1) parse table for the following Grammar:

S -> L
L -> L : L
L -> id R
R -> ( L )
R -> ( )
R -> Epsilon

There are two Problems here. First the L rules are a left recursion and the R rules have ( as the same prefix. So I modified the Grammar to this one:

S -> L
L -> id R X
X -> : L X
X -> Epsilon
R -> ( P
R -> Epsilon
P -> L )
P -> )

I then checked the Grammar with the JFLAP Tool and it says that my grammar is not LL(1). But I just don't see the Problem.

Can somebody help me out here?

regards Alex

by AdelPoint at September 02, 2015 07:22 PM

Planet Emacsen

Irreal: Pages in Emacs

Eric James Michael Ritz over at One More Game-Dev and Programming Blog has a nice post on how and why to use pages in Emacs. Pages in Emacs are text separated by form feeds (0xC)1. As Ritz explains, it's easy to move between pages by using 【Ctrl+x [】 (backward-page) and 【Ctrl+x ]】 (forward-page).

Most of us don't use pages but Ritz gives us a good use case. As he points out, you can install a package such as Steve Purcell's page-break-lines package that will make the breaks explicit and others that make navigation easier. Go read his post and see if you can make pages work for you.



Although, like everything else in Emacs, that's configurable. It's controlled by the page-delimiter variable.

by jcs at September 02, 2015 07:16 PM


Equivalence in finite sets of turing machines

I have this exercise and I really don't know how to complete it:

Prove that for all finite sets S of Turing Machines it is decidable
whether or not given any two Turing Machines M1 and M2 in S they
recognize the same language, that is, L(M1)=L(M2)

I know that the problem of equivalence of languages is undecidable in general, but the additional hypothesis of the finiteness of the set S of machines does not suggest me any reason why this case should be different. Can somebody help me?

by Giulia Frascaria at September 02, 2015 07:12 PM


Computing the DAG of a program given source code or AST

I've seen many papers on scheduling components or tasks once a DAG for the program is known, either by user-input or by domain restriction (i.e. all cross shaped 5-pt stencil codes have a known DAG).

Is there an algorithm for taking code (or Abstract syntax trees) from a language, say Python, and generating a DAG for it? I've been searching for a paper related to this for a long time and haven't found anything on google scholar.

Intuitively, I'd say that it's not possible, since some languages (such as Python) can change just about everything at run-time, and thus to determine the behavior and dependencies of a given program you would have to run it first for every possible set of inputs.

EDIT: I'm using DAG in this sense as the Directed Acyclic Graph for the dependencies (i.e. which parts depend on which other parts)

by Snakes and Coffee at September 02, 2015 06:33 PM


Maintaining complex state in Haskell

Suppose you're building a fairly large simulation in Haskell. There are many different types of entities whose attributes update as the simulation progresses. Let's say, for the sake of example, that your entities are called Monkeys, Elephants, Bears, etc..

What is your preferred method for maintaining these entities' states?

The first and most obvious approach I thought of was this:

mainLoop :: [Monkey] -> [Elephant] -> [Bear] -> String
mainLoop monkeys elephants bears =
  let monkeys'   = updateMonkeys   monkeys
      elephants' = updateElephants elephants
      bears'     = updateBears     bears
    if shouldExit monkeys elephants bears then "Done" else
      mainLoop monkeys' elephants' bears'

It's already ugly having each type of entity explicitly mentioned in the mainLoop function signature. You can imagine how it would get absolutely awful if you had, say, 20 types of entities. (20 is not unreasonable for complex simulations.) So I think this is an unacceptable approach. But its saving grace is that functions like updateMonkeys are very explicit in what they do: They take a list of Monkeys and return a new one.

So then the next thought would be to roll everything into one big data structure that holds all state, thus cleaning up the signature of mainLoop:

mainLoop :: GameState -> String
mainLoop gs0 =
  let gs1 = updateMonkeys   gs0
      gs2 = updateElephants gs1
      gs3 = updateBears     gs2
    if shouldExit gs0 then "Done" else
      mainLoop gs3

Some would suggest that we wrap GameState up in a State Monad and call updateMonkeys etc. in a do. That's fine. Some would rather suggest we clean it up with function composition. Also fine, I think. (BTW, I'm a novice with Haskell, so maybe I'm wrong about some of this.)

But then the problem is, functions like updateMonkeys don't give you useful information from their type signature. You can't really be sure what they do. Sure, updateMonkeys is a descriptive name, but that's little consolation. When I pass in a god object and say "please update my global state," I feel like we're back in the imperative world. It feels like global variables by another name: You have a function that does something to the global state, you call it, and you hope for the best. (I suppose you still avoid some concurrency problems that would be present with global variables in an imperative program. But meh, concurrency isn't nearly the only thing wrong with global variables.)

A further problem is this: Suppose the objects need to interact. For example, we have a function like this:

stomp :: Elephant -> Monkey -> (Elephant, Monkey)
stomp elephant monkey =
  (elongateEvilGrin elephant, decrementHealth monkey)

Say this gets called in updateElephants, because that's where we check to see if any of the elephants are in stomping range of any monkeys. How do you elegantly propagate the changes to both the monkeys and elephants in this scenario? In our second example, updateElephants takes and returns a god object, so it could effect both changes. But this just muddies the waters further and reinforces my point: With the god object, you're effectively just mutating global variables. And if you're not using the god object, I'm not sure how you'd propagate those types of changes.

What to do? Surely many programs need to manage complex state, so I'm guessing there are some well-known approaches to this problem.

Just for the sake of comparison, here's how I might solve the problem in the OOP world. There would be Monkey, Elephant, etc. objects. I'd probably have class methods to do lookups in the set of all live animals. Maybe you could lookup by location, by ID, whatever. Thanks to the data structures underlying the lookup functions, they'd stay allocated on the heap. (I'm assuming GC or reference counting.) Their member variables would get mutated all the time. Any method of any class would be able to mutate any live animal of any other class. E.g. an Elephant could have a stomp method that would decrement the health of a passed-in Monkey object, and there would be no need to pass that

Likewise, in an Erlang or other actor-oriented design, you could solve these problems fairly elegantly: Each actor maintains its own loop and thus its own state, so you never need a god object. And message passing allows one object's activities to trigger changes in other objects without passing a bunch of stuff all the way back up the call stack. Yet I have heard it said that actors in Haskell are frowned upon.

by rlkw1024 at September 02, 2015 06:23 PM


Bottleneck TSP with MST

There is a problem I don't know the answer too. The 3 approximation for the bottleneck TSP that involves first getting the MST. I have not been able to come up with the right "shortcut" method so far. Please help!

To be more specific.

The problem comes from Cormen's Introduction to Algorithms 3rd edition . The problem is

"In the bottleneck traveling-salesman problem, we wish to find the hamiltonian cycle that minimizes the cost of the most costly edge in the cycle. Assuming that the cost function satisfies the triangle inequality, show that there exists a polynomial time approximation algorithm with approximation ratio 3 for this problem. (Hint:Show recursively that we can visit all the nodes in a bottleneck spanning tree, as discussed in Problem 23-3, exactly once by taking a full walk of the tree and skipping nodes, but without skipping more than two consecutive intermediate nodes. Show that the costliest edge in a bottleneck spanning tree has a cost that is at most the cost of the costliest edge in a bottleneck hamiltonian cycle.)"

I can prove that the MST is also a bottleneck spanning tree, however when I search online and google scholar for the step where we walk on the bottleneck spanning tree, all of the articles are on much better approximations, and I can't seem to find any reference to this type of argument online.

by mmmi at September 02, 2015 06:15 PM


(False?) proof for computability of a function?

Consider $f(n)$, a function that returns 1 iff $n$ zeros appear consecutively in $\pi$. Now someone gave me a proof that $f(n)$ is computable:

Either for all n, $0^n$ appears in $\pi$, or there is a m s.t. $0^m$ appears in $\pi$ and $0^{m+1}$ does not. For the first possibility $f(n) := 1$; For the second one $f(n) := 1$ iff $n \leq m$, 0 otherwise.

The author claims that this proves computability of $f(n)$, as there exists an algorithm to compute it.

Is this proof correct?

by Mike B. at September 02, 2015 06:12 PM



2 Dimensional Subset Sum: looking for information

I do not know if this problems exists with a different name, if it is, I could not find it.

The problem is this:

Given a set $S$ of $n$ points in $\mathbb{Z}^2$, is there a subset $A\subset S$ such that the points of $A$ sum up to $(0,0)$?

If we restrict in $\mathbb{Z}$ this is the classic Subset-Sum problem. Furthermore this can be generalized in n-dimensions if all points are in $\mathbb{Z}^n$.

Since Subset-Sum is NP-complete so would be this problem thus, I am intrested in approximating it. (Here there is another issue, even for the classic Subset-Sum: since the target is $0$ how the approximation ratio be represented? Maybe using the largest integer or the mean of $S$?)

Subset-Sum has an FPTAS but I don't think that that is the case for the 2D Subset-Sum. One idea is to solve only in the $x$ dimension and from all the addmisable solutions pick the one that minimizes the sum in the $y$ dimension. But this cannot guarantee a ratio for the $y$ sum.

Does anyone has any information for me? Even bad news (inapproximability) are good news. The only close problem that I manage to locate is the Weighted Distribution problem (set of binary vector and ask if there is a subset that sums to the zero vector) coming from coding theory, proven to be $W[2]$-complete in this paper, thus not in EPTAS.

EDIT 1: Now I look at it again, I do not think that Weighted Distribution is similiar. It is more alike the classic Subset-Sum since every bit vector can just represent a number. Now I am looking at multidimensional Knapsack.

EDIT 2:An idea to tranform it to 1D-SS is to take the xx' (and after the yy' projections). The xx' projection will give you a good x value but for the y axis we cannot guarantee anything. Another approach, generalizing the classic SS aproximation idea (see here and here) is to create rings in $\mathbb{Z}^2$ but this also does not work good. Maybe if we take multiple projection and choose the best?

by Harry at September 02, 2015 06:09 PM



Algorithm to group two-dimensional data points

I feel like there should be a known algorithm to the following problem, but I am short of ideas how to construct or search for it.

Suppose as an input you have a list of two-dimensional data points (xi, yi). Your goal is to "simplify" this data by providing an output that is equivalent to the input, but grouped together using structures like this: ({x1, x2, .., xn}, {y1, y2, .., ym}), which denotes a set of all (xa, yb) where a ∈ {1, 2, .., n} and b ∈ {1, 2, .., m}. A perfect result is to have as few of these structures + remaining data points as possible.

Some points to consider:

  • The output can be a combination of simple data points and such structures.
  • The structures in the output should be mutually exclusive.
  • It may as well be that there is no way to simplify a given input.

Example 1:

Input: (1, 1), (1, 2), (2, 1), (2, 2), (4, 5) Output: ({1, 2}, {1, 2}), (4, 5)

Example 2:

Input: (1, 1), (1, 2), (1, 3), (4, 5) Output: ({1}, {1, 2, 3}), (4, 5)

Example 3:

Input: (1, 2), (2, 2), (3, 2), (2, 1), (2, 3) Output: ({1, 2, 3}, {2}), ({2}, {1, 3})

by dramblys at September 02, 2015 06:07 PM


Google kloppt jetzt mit dem großen Ban-Hammer auf ...

Google kloppt jetzt mit dem großen Ban-Hammer auf Webseiten drauf, die einem erstmal eine ganzseitige "downloaden Sie doch unsere App!1!!"-Werbung reindrücken. Sehr schön! ENDLICH!

So und jetzt bitte auch alle Webseiten runterscoren, die einen mit "like us on facebook" und alle Youtube-Videos, die "please subscribe"-Bullshit einblenden!

Oh und diese EU-Cookie-Scheiße kann auch mal in die Tonne. Lange nicht mehr ging etwas so nach hinten los wie das.

September 02, 2015 06:01 PM



Does removal of unit production from the grammar may increase number of total production?

I have doubt !

Problem is

There are m variable in a grammar. The number of productions after removal of unit productions in the worst case is ,(Assume there are no null productions)

(a) O(m)
(b) O($m^2$)

(c) O($k^m$)

(d) O($2^m$)

I try to explain

He asked about number of total production remains in the minimized grammar , so , if the there is no null production , it is advantage , as you know when we convert a grammar without null production , generally , removal of all production introduces new production in the resultant grammar (it can be upto set of all subset of given max RHS number of variable , so exponetial ) .

But , in given grammar , you don't have null production , so you are removing only unit production (i.e. A--->B type productions ) , resultant grammar maximum 'm' production , where m is number of variable . example :- S---->A A---->B B---->a/b/c resultant grammar , S---->a/b/c So , it is O(m) time .

Still , I have doubt "Is it dependent on number of variable ?" , and confused "It should be depend on number of non-terminal on RHS of the productions !" . I am clueless . I know removal of null production can increase the total number of production . I need explanation . This is not a homework .

by user4791206 at September 02, 2015 05:59 PM


A probabilistic set with no false positives?

So, Bloom filters are pretty cool -- they are sets that support membership checking with no false negatives, but a small chance of a false positive. Recently though, I've been wanting a "Bloom filter" that guarantees the opposite: no false positives, but potentially false negatives.

My motivation is simple: given a huge stream of items to process (with duplicates), we'd like to avoid processing items we've seen before. It doesn't hurt to process a duplicate, it is just a waste of time. Yet, if we neglected to process an element, it would be catastrophic. With a "reverse Bloom filter", one could store the items seen with little space overhead, and avoid processing duplicates with high probability by testing for membership in the set.

Yet I can't seem to find anything of the sort. The closest I've found are "retouched Bloom filters", which allow one to trade selected false positives for a higher false negative rate. I don't know how well their data structure performs when one wants to remove all false positives, however.

Anyone seen anything like this? :)

by Christopher Monsanto at September 02, 2015 05:52 PM


Wait for latest values from dependent streams in BaconJS?

I have 3 streams. gradingResult and contextId depend on studentResponse. I need to fire an event and only one event (otherwise, this is trivial) when all 3 have the latest values.

I've tried #combineTemplate and #sampledBy studentResponse. Unfortunately, I always see the wrong data---gradingResult and contextId have the old values in the combined template. How can I wait for all streams to have the latest values?

Code is shown below:

var studentResponse = new Bacon.Bus();
var gradingResult = new Bacon.Bus();
var contextId = new Bacon.Bus();

studentResponse.onValue(function(f) {

  studentResponse: studentResponse,
  gradingResult: gradingResult,
  contextId: contextId
  .onValue(function(t) {


Link to jsfiddle:

UPDATE: this is a contrived example. In the real code, gradingResult is an ajax request. Both gradingResult and contextId have time dependencies on studentResponse

by U Avalos at September 02, 2015 05:51 PM

How to implement map that associates a name with void function in C?

After reading Structure and Interpretation of Computer Programs (SICP) I decided to find a way to implement some of these functional programming techniques using C. I tried to write a program that makes a pair whose first argument is a name of the function and second arg is any function that takes one arg and returns one arg. Using implementation below I was expecting to see an output like:

fact(7) = 5040
fib(7) = 13

but instead I am getting

fact(7) = 5040
fib(7) = 0

along with warnings

$ cc map.c
map.c: In function ‘main’:
map.c:41:17: warning: assignment from incompatible pointer type [enabled by default]
   maps[0].f_ptr = &fact;
map.c:43:17: warning: assignment from incompatible pointer type [enabled by default]
   maps[1].f_ptr = &fib;
map.c:47:7: warning: passing argument 1 of ‘maps[i].f_ptr’ makes pointer from integer without a cast [enabled by default]
       ans = (int) maps[i].f_ptr((int) num);
map.c:47:7: note: expected ‘void *’ but argument is of type ‘int’
map.c:47:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       ans = (int) maps[i].f_ptr((int) num);
map.c:52:7: warning: passing argument 1 of ‘maps[i].f_ptr’ makes pointer from integer without a cast [enabled by default]
       ans2 = (int) maps[i].f_ptr((int) num);
map.c:52:7: note: expected ‘void *’ but argument is of type ‘int’
map.c:52:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
       ans2 = (int) maps[i].f_ptr((int) num);

during compilation. Looking at the code I don't see the problem but then again I haven't used C in quite some time. Is there a better way to implement such a construct and why is fib(7) printing a 0 instead of 13?

Here's my code:

struct Map
  char* name;
  void* (*f_ptr)(void*);

int fact(int a) {
    if (a == 0)
      return 0;
    if (a == 1)
      return 1;
    return a * fact (a-1);

int fib(int a) {
  if (a == 0)
    return 0;
  if (a == 1)
    return 1;
  return fib(a-1) + fib(a-2);

int findFunc (char* str, struct Map map)
  if ( == str)
    return 1;
  return 0;

int main()
  int i = 0;
  int ans = 0;
  int ans2 = 0;
  int num = 7;

  struct Map maps[2];
  maps[0].name = "fact";
  maps[0].f_ptr = &fact;
  maps[1].name = "fib";
  maps[1].f_ptr = &fib;

  for (i; i < (sizeof(maps)/sizeof(maps[0])); i++) {
    if (findFunc("fact", maps[i]))
      ans = (int) maps[i].f_ptr((int) num);

  for (i; i < (sizeof(maps)/sizeof(maps[0])); i++) {
    if (findFunc("fib", maps[i]))
      ans2 = (int) maps[i].f_ptr((int) num);

  printf("fact(%d) = %d\n", num, ans);
  printf("fib(%d) = %d", num, ans2);
  return 0;

by TarBall at September 02, 2015 05:47 PM


Speaker list for Haskell eXchange conference, 2-day Hackathon, and promo code

Hey everyone!

The programme for Haskell eXchange in London is finally getting settled, we’ll have 4 keynotes over the two days from Simon Peyton Jones, Lennart Augustsson, Simon Marlow and Luite Stegeman!

To check out the full list of speakers and their topics, go here:

Descriptions for some of the keynotes are still missing, hopefully it will be updated soon. We’re also announcing that there’s going to be a 2-day Hackathon over the following weekend where the focus will be on improving the Haskell infrastructure, anyone should feel free to come and if you’re a beginner that wants to start contributing to the Haskell infrastructure, you’ll be able to get help from experienced contributors.

Read the full description and register here (it’s free):

Finally, if you want to get 25% off on the conference ticket, you can use the promo code HASKELL-EXCHANGE-25, it should be valid until September 12th!

Hope to see you there! :)

by fredrik at September 02, 2015 05:34 PM


Does a reentrant list for signal queue in a single-thread environment exist?

I need to handle Unix signals in a single-threaded application with the following goals:

  1. Signals doesn't mask on receive (thus, the signal handler must be reentrant).
  2. I am not allowed to lose signal data (thus, if a new signal comes before the handler of the previous returned, it also must be handled correctly).

I have the common multi-threaded primitives (spinlocks, semaphores, etc). But they doesn't seem enough, because my higher-level data structures (even a such simple as a list) aren't thread-safe.

My initial idea was the following:

  1. I use a list, in which I store the data of the incoming signal fast,
  2. and process them (call the possibily much slower running handlers) later, out of the critical section.

The main problem with that, that the list data structure isn't thread safe. If I lock it, I can't store a second signal anywhere. I can't wait until the previous handler exits, because on the second signal it is essentially suspended in a critical section.

Simply I don't have any idea, how to handle the following scenario:

  1. signal1 comes, the process suspends, and the handler of signal1 starts
  2. signal2 comes, the handler of signal1 suspends, and the handler of signal2 starts
  3. Handler of signal2 returns, execution returns to the handler signal1
  4. Handler of signal1 returns, execution returns to the main program.

After thinking a lot on it, I have an impression, maybe my problem is unsolvable. Am I right? How do operating systems handle similar problems (for example, possibily bursting interrupts from hardware)?

by peterh at September 02, 2015 05:33 PM


Deciding functionality of transducers over infinite words

Given a finite state transducer defining a rational relation over infinite words, it is known to be decidable whether or not the relation is a function, i.e. whether each infinite input word is related to at most one infinite output word. This is detailed in a paper by Gire: Two Decidability Problems for Infinite Words. Unfortunately, I cannot find the full text of the paper anywhere. The basic idea seems to be to form the composition of the transducer with its inverse $T \circ T^{-1}$ and check if the resulting transducer is a restriction of the identity function. Note that the inverse of a transducer $T$ is $T$ with input and output word swapped for each transition.

I am looking for details of the decision procedure. Do you have any references or a short description of the algorithm?

by Markus at September 02, 2015 05:31 PM


Raptor Algorithm: Differences between trip and route

I'm reading Microsoft's paper about algorithm in public transit named Round-Based Public Transit Routing, short name is RAPTOR algorithm. In this algorithm, author has some definitions that I don't understand well:

About trip:

trip represents a sequence of stops a specific vehicle (train, bus, subway, . . . ) visits along a line

About route:

Each route consists of the trips that share the same sequence of stops.

So, if I understand well, trip means a "bus route" and route is collection of trip to make a path, right ? If my guess is true, why author states:

Typically there are more routes than trips

And if route is just collection of trips, how can I build all route?

Thanks :)

by Trần Kim Dự at September 02, 2015 05:30 PM




Liebe Leser, ich muss heute mal was für die Gerechtigkeit ...

Liebe Leser, ich muss heute mal was für die Gerechtigkeit tun. Ich habe eine Zuschrift als Köln gekriegt, wo sich jemand beschwert, dass hier immer nur über Verkacker aus Berlin hergezogen wird, und vielleicht mal Hamburg. Köln sei da auch ganz vorne mit dabei. Flughäfen kann Köln auch überzeugend in den Sand setzen, und auch Wahlen können die in Köln nicht :-)

Update: Auch der Elbphilharmonie hat Köln was entgegenzusetzen :-)

September 02, 2015 05:01 PM

Das Landgericht Hamburg hat mal wieder die Lage angeschaut ...

Das Landgericht Hamburg hat mal wieder die Lage angeschaut und dann das Urteil gefällt, das den meisten Schaden anrichtet: "Auch DIN-Normen können urheberrechtlich geschützt sein".

Bonus: Das Verfahren richtete sich gegen eine US-amerikanische Webseite.

Update: Ja, Leute, das ist altbekannt. Das war nicht mein Punkt. Mein Punkt war: Das wäre die Gelegenheit gewesen, dem Beuth-Verlag mal gepflegt die Geschäftsgrundlage zu entziehen. Es gibt nämlich überhaupt keine wie auch immer geartete Rechtfertigung dafür, dass jemand für den Zugriff auf Normen Geld verdient. Entweder es ist eine Norm, deren Zweck ist, eine allgemeine Vorschrift für den interoperablen Bau von Dingen zu sein, und dann hat die auch kostenfrei jedermann zugänglich zu sein, oder es ist keine Norm, dann können wir von mir aus den Markt das regeln lassen. Aber was wir hier haben ist ja gerade kein Markt, in dem irgendjemand anderes konkurrieren könnte, sondern wir haben Normen, an die ich mich halten muss, aber nur einer kann sie mir verkaufen. Hier hätte das LG Hamburg sagen können: Nee, Freunde, das ist nicht urheberrechtlich geschützt. Das ist vom Charakter her wie ein amtliches Dokument.

September 02, 2015 05:01 PM

Zu der Zalando-Geschichte kam noch eine Mail rein, ...

Zu der Zalando-Geschichte kam noch eine Mail rein, die mir erklärte, dass im Einzelhandeln die Anzeigen im Allgemeinen korrekt sind, allerdings der Versender immer geringfügig mehr Artikel auf Lager hat als angezeigt wird, damit man bei Race Conditions von mehreren Bestellern trotzdem liefern kann. Die Bestellprozesse bestehen halt aus mehreren Schritten und die Systeme updaten gelegentlich wohl ihre Datenbestände auch nicht in Echtzeit.

September 02, 2015 05:01 PM




How do I satisfy IntelliJ IDEA's expectation that my scala class's companion object's declaration be tested?

case class Point(x: Double = 0, y: Double = 0, z: Double = 0) extends Point3D(x, y, z) { def +(that: Point): Point = new Point(x + that.x, y + that.y, z + that.z) def -(that: Point): Point = new Point(x - that.x, y - that.y, z - that.z) def *(that: Point): Point = new Point(x * that.x, y * that.y, z * that.z) def /(that: Point): Point = new Point(x / that.x, y / that.y, z / that.z) } object Point { def apply(p3d: Point3D) = new Point(p3d.getX, p3d.getY, p3d.getZ) } 

IntelliJ is giving this code 92% coverage because the line object Point { is being counted in coverage, but I cannot figure out why or how to satisfy it's expectation. I have tried comparing it to something, tried adding another method and calling that... No such luck, and I am out of ideas.

Here is the test code. Any help would be much appreciated!

import org.scalatest._ class test_Point extends FlatSpec with Matchers { val u = Point(100, 200, 300) val v = Point(623, -85, 300) it should "supply zero-valued defaults" in { val p = Point() p.x should be (0) p.y should be (0) p.z should be (0) } it should "be constructable from base class" in { val p = Point(new Point3D(0, 0, 0)) } it should "implement + operator" in { val w = Point(u.x+v.x, u.y+v.y, u.z+v.z) u + v should be (w) } it should "implement - operator" in { val w = Point(u.x-v.x, u.y-v.y, u.z-v.z) u - v should be (w) } it should "implement * operator" in { val w = Point(u.x*v.x, u.y*v.y, u.z*v.z) u * v should be (w) } it should "implement / operator" in { val w = Point(u.x/v.x, u.y/v.y, u.z/v.z) u / v should be (w) } } 
submitted by Chronopolitan
[link] [8 comments]

September 02, 2015 04:53 PM


Planet Theory

Séminaire N. Bourbaki – Designs Exist (after Peter Keevash) – the paper



On June I gave a lecture on Bourbaki’s seminare devoted to Keevash’s  breakthrough result on the existence of designs. Here is a draft of the paper: Design exists (after P. Keevash).

Remarks, corrections  and suggestions are most welcome!

I would have loved to expand a little on

1) How designs are connected to statistics

2) The algebraic part of Keevash’s proof

3) The “Rodl-style probabilistic part” (that I largely took for granted)

4)   The greedy-random method in general

5)  Difficulties when you move from graph decomposition to hypergraph decomposition

6) Wilson’s proofs of his theorem

7) Teirlink’s proof of his theorem

I knew at some point in rough details both Wilson’s proof (I heard 8 lectures about and around it from Wilson himself  in 1978) and Teirlink’s (Eran London gave a detailed lecture at our seminar) but I largely forgot, I’d be happy to see a good source).

8) Other cool things about designs that I should mention.

9) The Kuperberg-Lovett-Peled work

(To be realistic, adding something for half these items will be nice.)

Here is the seminar page,  (with videotaped lectures), and the home page of Association des collaborateurs de Nicolas Bourbaki . You can find there cool links to old expositions since 1948 which overall give a very nice and good picture of modern mathematics and its highlights. Here is the link to my slides.

In my case (but probably also for some other Bourbaki’s speakers) , it is not that I had full understanding (or close to it) of the proof and just had to decide how to present it, but my presentation largely represent what I know, and the seminaire forced me to learn. I was lucky that Peter gave a series of lectures (Video 1, Video 2, Video3Video4 ) about it in the winter at our Midrasha, and that he decided to write a paper “counting designs” based on the lectures, and even luckier that Jeff Kahn  taught some of it at class (based on Peter’s lectures and subsequent article) and later explained to me some core ingredients. Here is a link to Keevash’s full paper “The existence of design,” and an older post on his work.

rue-pierre-et-marie-curie-75005-paris     photo 2

Curiously the street was named only after Pierre Curie until the 60s and near the sign of the street you can still see the older sign.

by Gil Kalai at September 02, 2015 04:42 PM


Seriously frustrated about my career

Hi guys, I'm very sad to say that I'm not happy as a programmer and I'd like to share discuss my experience as well as your perspective and opinion on it.

I got 'lucky' enough to try out programming when I was 12 and back then I decided that I wanted to do it all my life, I seemed seriously talented for algorithm designing and the idea of making computers do what I want rather than using other people's programs just seemed fun, and to this day I still have fun doing my job (relatively, maybe the first couple hours), yet I kinda think I fucked up.

I'm pretty negative about it right now. I dont like the kind of people I meet in this field of work, and I hate the environment and the schedule. I think that I had one chance to work a bit harder and get into something 'cool' like medicine or aviation even if I'm not particularly talented, but instead I chose to sit at an office all day every day. I think that the 'funky' atmosphere of most software companies is actually there to hide something dark, and that several young programmers are young adults wasting their life from 9am to 8pm after they wasted their teen years with videogames.

If you are a young guy considering the career, take my perspective into account but obviously take it with a grain of salt. And most importantly do some research on the different options you have instead of going for what you happened to be exposed to.

submitted by xHzMU9
[link] [comment]

September 02, 2015 04:37 PM


Transplanting Go packages for fun and profit

crazy Gopher scientist

A while back I read coders at work, which is a book of interviews with some great computer scientists who earned their stripes, the questions just as thoughtful as the answers. For one thing, it re-ignited my interest in functional programming, for another I got interested in literate programming but most of all, it struck me how common of a recommendation it was to read other people’s code as a means to become a better programmer. (It also has a good section of Brad Fitzpatrick describing his dislike for programming languages, and dreaming about his ideal language. This must have been shortly before Go came about and he became a maintainer.)

I hadn’t been doing a good job reading/studying other code out of fear that inferior patterns/style would rub off on me. But I soon realized that was an irrational, perhaps slightly absurd excuse. So I made the decision to change. Contrary to my presumption I found that by reading code that looks bad you can challenge and re-evaluate your mindset and get out with a more nuanced understanding and awareness of the pros and cons of various approaches.

I also realized if code is proving too hard to get into or is of too low quality, you can switch to another code base with negligible effort and end up spending almost all of your time reading code that is worthwhile and has plenty of learnings to offer. There is a lot of high quality Go code, easy to find through sites like Github or Golang weekly, just follow your interests and pick a project to start reading.

It gets really interesting though once you find bodies of code that are not only a nice learning resource, but can be transplanted into your code with minimal work to solve a problem you’re having, but in a different context then the author of the code originally designed it for. Components often grow and mature in the context of an application without being promoted as reusable libraries, but you can often use them as if they were. I would like to share 2 such success cases below.

Nsq’s diskqueue code

I’ve always had an interest in code that manages the same binary data both in memory and on a block device. Think filesystems, databases, etc. There’s some interesting concerns like robustness in light of failures combined with optimizing for performance (infrequent syncs to disk, maintaining the hot subset of data in memory, etc), combined with optimizing for various access patterns, this can be a daunting topic to get into.

Luckily there’s a use case that I see all the time in my domain (telemetry systems) and that covers just enough of the problems to be interesting and fun, but not enough to be overwhelming. And that is: for each step in a monitoring data pipeline, you want to be able to buffer data if the endpoint goes down, in memory and to disk if the amount of data gets too much. Especially to disk if you’re also concerned with your software crashing or the machine power cycling.

This is such a common problem that applies to all metrics agents, relays, etc that I was longing for a library that just takes care of spooling data to disk for you without really affecting much of the rest of your software. All it needs to do is sequentially write pieces of data to disk and have a sequential reader catching up and read newer data as it finishes processing the older.

NSQ is a messaging platform from bitly, and it has diskqueue code that does exactly that. And it does so oh so elegantly. I had previously found a beautiful pattern in bitly’s go code that I blogged about and again I found a nice and elegant design that builds further on this pattern, with concurrent access to data protected via a single instance of a for loop running a select block which assures only one piece of code can make changes to data at the same time (see bottom of the file), not unlike ioloops in other languages. And method calls such as Put() provide a clean external interface, though their implementation simply hooks into the internal select loop that runs the code that does the bulk of the work. Genius.

func (d *diskQueue) Put(data []byte) error {
  // some details
  d.writeChan <- data
  return <-d.writeResponseChan

In addition the package came with extensive tests and benchmarks out of the box.

After finding and familiarizing myself with this diskqueue code about a year ago I had an easy time introducing disk spooling to Carbon-relay-ng, by transplanting the code into it. The only change I had to make was capitalizing the Diskqueue type to export it outside of the package. It has proven a great fit, enabling a critical feature through little work of transplanting mature, battle-tested code into a context that original authors probably never thought of.

Note also how the data unit here is the []byte, the queue does not deal with the higher level nsq.Message (!). The authors had the foresight of keeping this generic, enabling code reuse and rightfully shot down a PR of mine that had a side effect of making the queue aware of the Message type. In NSQ you’ll find thoughtful and deliberate api design and pretty sound code all around. Also, they went pretty far in detailing some lessons learned and providing concrete advice, a very interesting read, especially around managing goroutines & synchronizing their exits, and performance optimizations. At Raintank, we had a need for a messaging solution for metrics so we will so be rolling out NSQ as part of the raintank stack. This is an interesting case where my past experience with the NSQ code and ideas helped to adopt the full solution.

Bosun expression package

I’m a fan of the bosun alerting system which came out of Stack Exchange. It’s a full-featured alerting system that solves a few problems like no other tool I’ve seen does (see my linked post), and timeseries data storage aside, comes with basically everything built in to the one program. I’ve used it with success. However, for litmus I needed an alerting handler that integrated well into the Grafana backend. I needed the ability to do arbitrarily complex computations. Graphite’s api only takes you so far. We also needed (desired) reduction functions, boolean logic, etc. This is where bosun’s expression language is really strong. I found the expression package quite interesting, they basically built their own DSL for metrics processing. so it deals with expression parsing, constructing AST’s, executing them, dealing with types (potentially mixed types in the same expression), etc.

But bosun also has incident management, contacts, escalations, etc. Stuff that we either already had in place, or didn’t want to worry about just yet. So we could run bosun standalone and talk to it as a service via its API which I found too loosely coupled and risky, hook all its code into our binary at once - which seemed overkill - or the strategy I chose: gradually familiarize ourself and adopt pieces of Bosun on a case by case basis, making sure there’s a tight fit and without ever building up so much technical debt that it would become a pain to move away from the transplanted code if it becomes clear it’s not/no longer well suited. For the foreseeable future we only need one piece, the expression package. Potentially ultimately we’ll adopt the entire thing, but without the upfront commitment and investment.

So practically, our code now simply has one line where we create a bosun expression object from a string, and another where we ask bosun to execute the expression for us, which takes care of parsing the expression, querying for the data, evaluating and processing the results and distilling everything down into a final result. We get all the language features (reduction functions, boolean logic, nested expressions, …) for free.

This transplantation was again probably not something the bosun authors expected, but for us it was tremendously liberating. We got a lot of power for free. The only thing I had to do was spend some time reading code, and learning in the process. And I knew the code was well tested so we had zero issues using it.

Much akin to the NSQ example above, there was another reason the transplantation went so smoothly: the expression package is not tangled into other stuff. It just needs a string expression and a graphite instance. To be precise, any struct instance that satisfies the graphiteContext interface that is handily defined in the bosun code. While the bosun design aims to make its various clients (graphite, opentsdb, …) applicable for other projects, it also happens to let us do opposite: reuse some of its core code - the expression package - and pass in a custom graphite Context, such as our implementation which has extensive instrumentation. This lets us use the bosun expression package as a “black box” and still inject our own custom logic into the part that queries data from graphite. Of course, once we want to change the logic of anything else in the black box, we will need come up with something else, perhaps fork the package, but it doesn’t seem like we’ll need that any time soon.


If you want to become a better programmer I highly recommend you go read some code. There’s plenty of good code out there. Pick something that deals with a topic that is of interest to you and looks mature. You typically won’t know if code is good before you start reading but you’ll find out really fast, and you might be pleasantly surprised, as was I, several times. You will learn a bunch, possibly pretty fast. However, don’t go for the most advanced, complex code straight away. Pick projects and topics that are out of your comfort zone and do things that are new to you, but nothing too crazy. Once you truly grok those, proceed to other, possibly more advanced stuff.

Often you’ll read reusable libraries that are built to be reused, or you might find ways to transplant smaller portions of code into your own projects. Either way is a great way to tinker and learn, and solve real problems. Just make sure the code actually fits in so you don’t end up with the software version of Frankenstein’s monster. It is also helpful to have the authors available to chat if you need help or have issues understanding something, though they might be surprised if you’re using their code in a way they didn’t envision and might not be very inclined to provide support to what they consider internal implementation details. So that could be a hit or miss. Luckily the people behind both nsq and bosun were supportive of my endeavors but I also made sure to try to figure out things by myself before bothering them. Another reason why it’s good to pick mature, documented projects.

Gopher frankenstein

Part of the original meaning of hacking, extended into open source, is a mindset and practice of seeing how others solve a problem, discussion and building on top of it. We’ve gotten used to - and fairly good at - doing this on a project and library level but forgot about it on the level of code, code patterns and ideas. I want to see these practices come back to life.

We also apply this at Raintank: not only are we trying to build the best open source monitoring platform by reusing (and often contributing to) existing open source tools and working with different communities, we realize it’s vital to work on a more granular level, get to know the people and practice cross-pollination of ideas and code.

Next stuff I want to read and possibly implement or transplant parts of: dgryski/go-trigram, armon/go-radix, especially as used in the dgryski/carbonmem server to search through Graphite metrics. Other fun stuff by dgryski: an implementation of the ARC caching algorithm and bloom filters. (you might want to get used to reading Wikipedia pages also). And mreiferson/wal, a write ahead log by one of the nsqd authors, which looks like it’ll become the successor of the beloved diskqueue code.

Go forth and transplant!

Also posted on the Raintank blog

by Posts on Dieter&#039;s blog at September 02, 2015 04:25 PM


How to construct a grammar that generates language L?

I'm in a Formal languages class and have a grammar quiz coming up. I'm assuming something like this will appear.

Consider the alphabet $\Sigma$ = {a, b, c}. Construct a grammar that generates the language $L = \{bab^nabc^na^p \mid n ≥ 0, p ≥ 1\}$. Assume that the start variable is $S$.

by Terry Light at September 02, 2015 04:23 PM

Planet Theory

News for August

This month saw more development on testing properties of distributions and a result with intriguing connections to property testing. And for readers who may have missed it, Clément Canonne and Gautam Kamath wrote an engaging survey of some recent work on testing properties of distribution here.

Optimal algorithms and lower bounds for testing closeness of structured distributions by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). One of the fundamental results in testing properties of distributions is that if we want to estimate the (\(L_1\)) distance between two unknown distributions on a domain of size \(n\) up to some constant additive factor,  we need to draw \(O(n^{2/3})\) samples from these distributions, and this sample complexity is tight in general. But what if we consider the same problem in the setting where we are promised that the distributions come from some (known) class of distribution? This paper shows that, for many natural classes of distributions, we can obtain much more efficient algorithms for testing the closeness of distributions in this setting.

Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions by Parikshit Gopalan, Noam Nisan, Rocco A. Servedio, Kunal Talwar, and Avi Wigderson (arXiv). The (maximum) sensitivity of a Boolean function \(f : \{0,1\}^n \to \{0,1\}\) is the size of the largest set \(S \subseteq [n]\) such that there is an input \(x \in \{0,1\}^n\) for which \(f(x) = f(y)\) for every input \(y\) obtained by flipping the value of the \(i\)th coordinate of \(x\) for some \(i \in S\). One of the results in this paper shows that functions with low sensitivity can be locally self-corrected: given query access to a function \(r : \{0,1\}^n \to \{0,1\}\) that is close to a function \(f\) with low sensitivity (think of \(r\) as a corrupted version of \(f\)), there is an algorithm that for any input $x \in \{0,1\}^n$, queries \(r\) on a few inputs and outputs with high probability the value \(f(x)\). This notion of local self-correction is of course closely related to locally-decodable codes; it is also one of the fundamental techniques used to obtain many results in property testing as well (see for example Chapter 3 of Dana Ron’s survey). Can this result, or the techniques used to obtain it, also yield new results in property testing?


by Eric Blais at September 02, 2015 04:14 PM


Full Kelly portfolios having same weights as tangency portfolios

I'm currently comparing empirically the differences between Markowitz and Kelly portfolios. I calculated the Kelly weights for monthly return observations over 10 years for a sample of 50 stocks from the S&P 500. Without constraints, I received a highly levered Kelly portfolio of weights summing up to 23 with mean of 85% and std. deviation of 91%. I then de-levered the portfolio proportionally and received the same tangency portfolio with the exact same weights as in the Mean-Variance-Optimization case. I was surprised by this result and wanted to ask you if someone knows why this could be the case or made similar observations.

I’m very grateful for any kind of advice on this subject.

by User1111 at September 02, 2015 04:12 PM


What is the difference of temporal dynamics in RNNs and the NEF

Both spiking neural networks created with the Neural Engineering Framework (NEF) and Recurrent Neural Networks (RNNs) can be connected recurrently to exhibit neural dynamics. What is the difference between the set of dynamics that they can approximate and/or exhibit?

by Seanny123 at September 02, 2015 04:05 PM


Islands Finanzkrisenbewältigung ist so weit fortgeschritten, ...

Islands Finanzkrisenbewältigung ist so weit fortgeschritten, dass die Regierung jetzt ihre Anteile an der Landesbanki verkaufen will. Die musste damals notverstaatlicht werden, die (wohl über 70% der Gesamt-) Anteile wurden dann vom staatlichen Rettungsfonds ISFI gehalten. Formell gibt es zwei Voraussetzungen für den Verkauf: Das Budgetgesetz muss das beinhalten (ist der Fall) und ISFI muss nach einer Prüfung des Marktumfeldes einen formellen Vorschlag vorlegen (ist noch nicht geschehen).

Das ist aber auf jeden Fall ein Meilenstein für Island. Es ist übrigens spannend, dass bei einer aktuellen Umfrage nur gut 40% für einen Verkauf waren, und die Mehrheit wollte die Bank lieber staatlich halten.

September 02, 2015 04:01 PM

Das kennt ihr sicher, klassischer E-Mail-Verteiler-Anfängerfehler. ...

Das kennt ihr sicher, klassischer E-Mail-Verteiler-Anfängerfehler. Alle Empfänger nicht im Bcc sondern im To oder Cc. Das kann auch richtig peinlich werden, wenn es eine Klinik für Geschlechtskrankheiten mit HIV-Patienten macht.

September 02, 2015 04:01 PM


Measure purity of a function

Is there an existing solution to detect if a function is pure or impure, or the degree of purity using some metrics?

The case is, I want to transition my existing app code to more functional-style, and one of the desires is to reduce the impurity of functions to the bare minimum level. I can do it manually, but if can be automated, why not?

I know cyclomatic complexity can be measured with tools like ESLint, so I think purity can be too.

by Sergei Basharov at September 02, 2015 04:00 PM

High Scalability

Building Globally Distributed, Mission Critical Applications: Lessons From the Trenches Part 2

This is Part 2 of a guest post by Kris Beevers, founder and CEO, NSONE, a purveyor of a next-gen intelligent DNS and traffic management platform. Here's Part 1.

Integration and functional testing is crucial

Unit testing is hammered home in every modern software development class.  It’s good practice. Whether you’re doing test-driven development or just banging out code, without unit tests you can’t be sure a piece of code will do what it’s supposed to unless you test it carefully, and ensure those tests keep passing as your code evolves.

In a distributed application, your systems will break even if you have the world’s best unit testing coverage. Unit testing is not enough.

You need to test the interactions between your subsystems. What if a particular piece of configuration data changes – how does that impact Subsystem A’s communication with Subsystem B? What if you changed a message format – do all the subsystems generating and handling those messages continue to talk with each other? Does a particular kind of request that depends on results from four different backend subsystems still result in a correct response after your latest code changes?

Unit tests don’t answer these questions, but integration tests do. Invest time and energy in your integration testing suite, and put a process in place for integration testing at all stages of your development and deployment process. Ideally, run integration tests on your production systems, all the time.

There is no such thing as a service interrupting maintenance

by Todd Hoff at September 02, 2015 03:56 PM


Commonality among greatest evidences for variations of class separations [on hold]

There are many classes which are conjecturally separate in complexity theory? For instance, the famous $P$ versus $NP$ problem.

What are famous examples of such classes which are conjecturally separate and what is commonly considered as greatest evidences for and against such conjectures? What is the main theme behind such evidences?

Note that this post is not related to known barriers in class separation.

Even evidence that support remote possibilities such as $P=NP$ are welcome.

I think a survey of such evidences would be of utility.

by Turbo at September 02, 2015 03:50 PM




type erasure and overloading

I read that overloading has some restrictions because of type erasure. For instance,

def f(s: Seq[Int]) = ... def f(s: Seq[Double]) = ... 

are not both allowed.

Why is that? Type erasure is a runtime thing and overloading a compiletime thing. So, couldn't the compiler define two different functions and decide at compiletime which one to call?

submitted by Kiuhnm
[link] [5 comments]

September 02, 2015 03:18 PM


Why higher order procedures?

So if a language provides higher order procedure then I can have procedure that returns procedure. Something like:

(define (Proc a b c)
  (lambda (x) ( #| method body here in terms of a b c and x |# )))

To create new procedure, I would just do something like:

(define ProcA (Proc a1 b1 c1)) ; Would create ProcA that has 1 argument

Similar task could be done in a language which does not support higher order procedure by defining Proc that takes 4 instead of 3 arguments and calling this procedure to define ProcA, like:

(define (Proc a b c x) ( #| method body -- does not return any procedure |# )
(define (ProcA x) (Proc a1 b1 c1 x))

So why is there so much fuzz about higher order procedure? Am I missing something?

by StackUnderflow at September 02, 2015 03:12 PM





Recurrence Relation(with Square root)

I came across a very peculiar recurrence relation :

$\sqrt {T(n)} = \sqrt {T(n-1)} + 2 \sqrt {T(n-2)} $

with initial values $T(0) = T(1) = 1$

Any helps on how to find it

by Kishan Kumar at September 02, 2015 02:52 PM

Differences between SISD, SIMD and MIMD architecture (Flynn classification)

I have a problem with classifying certain CPUs to the proper classes of Flynn's Taxonomy.

1. Zilog Z80

According to this article on Sega Retro, Z80 has limited abilities to be classified as SIMD:

a limited ability for SIMD (Single Instruction, Multiple Data) with instructions to perform copy, compare, input, and output over contiguous blocks of memory;

For what I understand, Z80 is usually behaving as a SISD but when it comes to performing thing like copying or comparing Z80 is able to process multiple data using a single instruction.

How should we classify Z80 then? Is the ability to become SIMD processor a voice for or against saying that Z80 implements SIMD architecture?

2. Intel i5 (Dual core)

Form what I understand, we classify multicore CPUs as MIMD.

Is it as simple as that?

3. ARM Cortex-A15 (single core)

I'd classify the architecture of this processor as a SIMD model. Wikipedia says that it has superscalar pipeline, but as we know from Why is a superscalar processor SIMD? that multiple pipelines does not imply MIMD model.

Are "modern" single cores usually implementing SIMD model or not?

by moonnoon at September 02, 2015 02:49 PM

Dave Winer

Progress on my Slack clone

I spent the last 1.5 weeks working on a very simple chat app that implements a subset of the Slack API.

It works!

I baked the server side of the code into a new version nodeStorage, which is a convenient place to put it because it already has full support for Twitter identity and of course storage, both of which come in handy for this app.

I'm not expecting to release the client soon. I may use it as a chat function for Scripting News, but first I want to let the whole thing settle in.

However, I will include the server-side for webhooks in a new release of nodeStorage. That to me is the really interesting part. Chat programs are commonplace, but such a nice simple API implemented in Node.js seems like it might have value to others. (Update: The new version of nodeStorage has been released.)

It's been a really fun project.

PS: I wrote about this project last month.

September 02, 2015 02:40 PM



PL constructs and/or formalisms that work well for iteration through two (or more) collections?

Lately, have been pondering a question about good ways to specify and/or code iteration. Some background:

One thing we have gotten very good at dealing with in recent years is iteration through a collection, both formally and in code. Once upon a time the counter-based for-loop held unquestioned dominance in programming land, but nowadays virtually every PL has some notion of an iterable sequence and some support for higher-order abstractions. We map and filter and zip and fold. We use comprehensions. We compose various of these. Etc. Only in "C" does the old way still rule.

Ah, but what if we have two collections? Obviously, we can concatenate. We can do a Cartesian product by nesting constructs. And we can iterate in lockstep using a zip or something similar. But there are other things we might want to do.

I'm thinking primarily of set-like operations on (usually sorted) sequences. Quintessential examples are std::set_intersection and the rest of them from the C++ Standard Template Library. The Stable Merge operation that is part of Merge Sort is another example. In each of these operations, we are given two iterable sequences, and we traverse each just once, in forward sequential order. But we may not handle the two sequences at the same rate.

As an experiment, I tried implementing the set-like operations on sequences in a variety programming languages. I generally found them at least somewhat annoying to write in all PLs: messy, inconvenient, error-prone. Consistently, the standard constructs of the language were a relatively poor match for the problem I was solving.

So, my question: has anyone looked at this problem? Has there been significant progress? In particular, are there any programming-language constructs, or formalisms, that have been found to be a good match for this multiple-iteration idea?

(Apologies for lousy the post title. I could not think of a concise way to explain what I am wondering about. Still can't, actually.)

submitted by ggchappell
[link] [1 comment]

September 02, 2015 02:31 PM


How to reduce repetition of prime factors of a number in this method

I tried to solve Project Euler #3 as per the documented solution but it is resulting in repetition of prime factors.

In the above example for the factors of 10, 2 is repeating multiple time (3 times) where as 5 isn't repeating (I understood the reason behind it), but how to avoid it and print only factors of 10 just 2 and 5.

by chintu srikanth at September 02, 2015 02:25 PM




Consecutive ones property in rows and columns

The clique vertex matrix of interval graphs have consecutive ones property along the columns.

Adjacency matrix of which graphs have consecutive ones property in both rows and columns?

by user35368 at September 02, 2015 02:12 PM

Graphs with consecutive ones property in both rows and columns

Consecutive ones property of the rows of adjacency matrix of a graph gives a characterisation of interval graphs.

Adjacency matrix of which graphs have consecutive ones property in both rows and columns ??

by user35262 at September 02, 2015 02:07 PM



Vorschlag des Jahres: Use Al Qaeda Fighters to Beat ...

Vorschlag des Jahres: Use Al Qaeda Fighters to Beat ISIS! Kommt von Ex-General Petraeus, und der kennt sich ja aus mit schlechten Ideen und aus ihnen folgenden sinnlosen Endloskriegen. Herzlichen Gruß auch an die Fotoredaktion für diesen Beitrag :-)

September 02, 2015 02:01 PM


Is there a stand-alone statistical ZK argument with concurrent knowledge extraction?

Is any known construction for an interactive argument of knowledge that

  • is stand-alone statistical zero-knowledge, and
  • allows concurrent knowledge extraction?

This is a weakening of my previous question from 6 months ago, and is something that would presumably be easier to construct.

by Ricky Demer at September 02, 2015 01:54 PM

Quasi-polynomial time algorithm for graph isomorphism

GI arguably is the best known candidate for NP-intermediate problem. The best known algorithm is sub-exponential algorithm with run-time $2^{O(\sqrt{n \log n})}$. However, I am not aware of any quasi-polynomial algorithm for it. GI is not $NP$-complete unless polynomial hierarchy collapses. Minimum Dominating set in Tournaments, Group Isomorphism, and Tournament Isomorphism problems are solvable in quasi-polynomial time ($O(n^{\log n})$). The last two problems are polynomial time reducible to GI.

Is there any complexity theoretic conjecture that would rule out quasi-polynomial time algorithms? Can we efficiently reduce Minimum Dominating set in Tournaments to GI? Is there any reason to believe that GI can not be hard for quasi-polynomial time?

by Mohammad Al-Turkistany at September 02, 2015 01:41 PM


Reading from screen of laptop/pc/mobile(bluestack) [on hold]

Here is my question. Is there any programing language or a software which can read from specific portion of screen?

enter image description here

Like in this image (think as i have opened it from bluestack) there is "inbox " Written on top..

So if i give x and y coordinate enboxing those letters then output should be "inbox is written in those coordinate".

by Kesh at September 02, 2015 01:28 PM


Lamer News


Dave Winer

CitiBike goes north

Things are going to get interesting as CitiBike goes north up the west side of Manhattan.

The yellow markers in the map below are stations that haven't been installed yet. But some of the new ones have already been installed.

A picture named citibikemap.png

Places you'll be able to get to by bike: Lincoln Center, Zabar's, Barney Greengrass, the Museum of Natural History, the Metropolitan Museum, 72nd St stores, the Marina, a bunch of new places on the edges of the park. I only wish they put a station near the fountain in the middle of the park. It would probably have to be huge it would be so well-used. (And it would lead to lots of abuse by bikes on walking paths, which is probably why they didn't do it.)


If you have a restaurant or shop in Manhattan, west Brooklyn or Queens, pay for a CitiBike station outside your business, and let your customers know they can conveniently ride a bike to come eat or shop there. It'll be like having a subway stop in front of your store, soon enough. An investment for the future of the city.

September 02, 2015 12:37 PM



What's the `name` argument to `sh` for/do?

Taken from FreeBSD's man page on sh (because its the most convenient online, target platform is Debian if it matters):

SH(1)           FreeBSD General Commands Manual          SH(1)

     sh -- command interpreter (shell)

     sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] [script [arg ...]]
     sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] -c string [name [arg ...]]
     sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] -s [arg ...]

I'm particularly interested in the use case of:

sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] -c string [name [arg ...]]


# Normal:
myscript hello world
myscript hello world >/var/log/myscript.combined 2>&1

# Constrained:
#  - Arguments 'hello' and 'world' are suffixed by calling program.
myscript >/var/log/myscript.combined 2>&1 hello world

# Constrained Further:
#  - Wrapped in `sh` to catch system messages, like Segmentation Fault

sh -c 'myscript $@' _ >/var/log/myscript.combined 2>&1 hello world

I noticed that the first augment was not passed to myscript, and checking the documentation alludes to a name parameter, that I didn't see a doc section on. In my example I have added ad _ in place of the name argument, but:

What should I fill for name?

by ThorSummoner at September 02, 2015 12:07 PM


What kind of extra optimizations can a JIT compiler perform to a dynamically typed language?

For dynamically typed languages such as python, ruby etc., the usual approach to maximize runtime performance is by just-in-time compilation. pypy would be a good example.

It is possible to achieve dynamic typing with a statically typed language such as C++ by attaching a type code to every object, and the functions can operate for the correct type by runtime branching. That is, you can switch the type code.

I once did an experiment in C++ by creating a dynamic type that can hold either an integer, floating point number, a pointer to an arbitrary precision integer, or a pointer to a dynamically allocated array. In this class there is an integer representing the type code and a union that holds an int, double, and void *. With this, I could directly translate a piece of python code leaving all the dynamic properties there.

The speed improvement after this translation was about 20~40% depending on the type of code. But the interesting part is that pypy often performed unbeatably better than the C++ translation. Of course a fully optimized C++ code with its usual static type information will be the fastest, but I could realize that the pypy optimizer is applying some special optimizations that a typical static compiler doesn't.

I'm no expert in this area, but I'm still interested in what kind of extra optimizations a JIT compiler can perform that a static compiler cannot do easily, especially to dynamically typed languages.

by xiver77 at September 02, 2015 12:01 PM


Die Türkei will Anklage gegen zwei Journalisten von ...

Die Türkei will Anklage gegen zwei Journalisten von Vice News erheben, weil auf dem Laptop ihres ortskundigen Führers Verschlüsselungssoftware gefunden wurde, die häufig von ISIS benutzt wird.

September 02, 2015 12:01 PM


Converting domain-specific natural language to database query

I am interested in building something similar to what is shown in this video.

For those unable to watch the video: it shows a system that takes natural language queries (which are all related to basketball) and converts them to database queries.

Examples from the video:

"lebron james most points"
"Who was the last player to average 24+ ppt in a season"
"lebron ppg by month last season"

My plan was to build a grammar, where different rules would map to parts of a database query. To process a new natural language query, I would then stitch together these different query-parts by traversing the tree in-order (or maybe post-order) to construct a query.

My thoughts are that this is a pretty naive approach, and am looking for a way to bolster this approach.

by TylerAndFriends at September 02, 2015 11:36 AM


Who introduced nondeterministic computation?

I have two historical questions:

Who first described nondeterministic computation?

I know that Cook described NP-complete problems, and that Edmonds proposed that P algorithms are "efficient" or "good" algorithms.

I searched this Wikipedia article and skimmed "On the Computational Complexity of Algorithms," but couldn't find any reference to when nondeterministic computation was first discussed.

What was the first reference to the class NP? Was it Cook's 1971 paper?

by Philip White at September 02, 2015 11:31 AM

Dave Winer

Launching a Mac app from Node?

Back in 1999, I needed to keep Frontier running on a Mac server, so I wrote a little AppleScript that watched to see if the app was still running, and if not, launch it.

I need the same functionality today, but I'd rather use Node than AppleScript, if possible.

The question

Is there a way to make system calls to the Mac OS from Node that allow you to:

  1. Tell if an app is still running.

  2. Launch an app.

Any help much appreciated.

PS: Cross-posted on Facebook.


Ted Howard posted a solution on the Frontier-User list.

It took a few tries but I got it working on my Mac.

Problem solved.

September 02, 2015 11:22 AM



Trying to understand a paper on ksvd algorithm (dictionary learning) by Elad, et al

Trying to understand a paper titled KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation by M.Elad, et al;
my take of section IV.C. detailed description of KSVD, is that we have an objectiove function which we wish to minimize with respect to two variables, D (which is called the dictionary) and X (the coefficient matrix. $$\min_{X,D} \{||Y-DX||_F^2\} \quad \textrm{ subject to} \quad \forall i, ||x_i||\leq T_0$$

One method of doing this is to divide it to two stages, sparse coding and dictionary update. In the sparse coding stage, we assume D is fixed and solve the above problem w.r.t X using a pursuit algorithm. Then we must update the dictionary to better fit our data using the coefficient matrix from previous stage. In the update stage, we keep both X and D fixed and only one column of D and its corresponding row in X remains in question. After simplifying the objective function and decomposing the multiplication DX to the sum of k rank-1 matrices, one might suggest to use svd. But that would ruin the sparsity of X, so wee need to set constraints. At this point, the authors define some variables and use them for setting constraints. The problem is, I don't understand what is meant by this statement:

examples that use the atom $d_k$,i.e. those where $x_T^{k} (i)$ is nonzero.

Any clarifications would be appreciated.

by Gigili at September 02, 2015 11:10 AM


Fred Wilson

What Are We Doing?

I’ve been told over the years that the number one reason employees leave a company for another job is that they don’t know where the company is headed and they don’t know what their role in that roadmap is.

Apparently a bad boss and below market comp and long working hours all come in lower than that one.

Which tells me that if people believe in you and your plan and have a role to play in it, they will put up with a lot of other stuff.

I am not advocating for a hostile workplace or below market comp.

What I am advocating for is the value of having a clear and intelligent plan, communicating it often and often, and, most importantly, mapping that plan to each team and each person in your organization.

This is pretty easy when you are five people. It gets harder when you are fifty people. And it’s really hard when you are 500 people.

The communication plan is very important particularly as the size of the company grows. And making it matter to everyone is quite challenging.

But it all starts with a plan. I know companies that are great at communicating but don’t really have a coherent plan. All the communicating in the world won’t help them.

So figure out where you are taking your company. Answer the basic question on everyone’s minds, “what are we doing?”, and you will be on a path to building a loyal, hardworking, and motivated team.

by Fred Wilson at September 02, 2015 11:05 AM



Likelihood of a caplet ending in the money

with what likelihood would one expect an ATM caplet to end up in the money? Just as a very rough guess, from real world experience.

When I consider N(d2) from the Black formula, for spot = strike = 4%, vola = 20%, T = 1, tenor = 12m, I get something around 46.*%.

On the other hand, when I think about it qualitatively: Under most market conditions, the forward rate curve is increasing. As a very, very rough argument, we would expect the spot rate 1 year into the future to be similar to today's spot rate rather than today's 1 year forward rate, i.e. to be lower than today's ATM strike. Thus, under the physical measure, the caplet should be less likely to end up in the money than end up out of the money. Does this roughly conform with a number around 46%?

For my thesis, I am using a simulation model that tries to create synthetic realizations of market data (in a complicated procedure that would go too far to explain here). In this synthetic world, I get ITM-likelihoods much lower than the above and I am wondering about real world estimates from a practitioner's point of view.

by ettlich at September 02, 2015 10:26 AM


How Do I Read This Hole Error

I am trying to write an implementation of flatMap. But this is inconsequential, I'm really interested in understanding the error message. But for context so far I have:

flatMap ::
(a -> List b)
  -> List a
  -> List b
flatMap f xs = foldRight undefined undefined _undefined

This gives the following error for the type hole:

Found hole ‘_undefined’ with type: List a0
    Where: ‘a0’ is an ambiguous type variable
    Relevant bindings include
      xs :: List a (bound at src/Course/List.hs:266:11)
      f :: a -> List b (bound at src/Course/List.hs:266:9)
      flatMap :: (a -> List b) -> List a -> List b
        (bound at src/Course/List.hs:266:1)
    In the third argument of ‘foldRight’, namely ‘_undefined’
    In the expression: foldRight undefined undefined _undefined
    In an equation for ‘flatMap’:
        flatMap f xs = foldRight undefined undefined _undefined

I know the right peg for this hole is xs, but I dont understand why the haskell compiler has given it a type denoted List a0 and how that relates to the type I'm going to use of List a?

I think it has something to do with the fact that the whole could take a superset of the type List a or something? by why then didn't it just call it List b instead of specifically List a0.

Cheers, Jim

by JimmyP at September 02, 2015 09:37 AM

Lambda the Ultimate Forum

The most obsolete infrastructure money could buy - my worst job ever

A funny article by Juho Snellman about really existing legacy software engineering and PLT.

For example on my first day I found that X was running what was supposedly [the] largest VAXcluster remaining in the world, for doing their production builds. Yes, dozens of VAXen running VMS, working as a cross-compile farm, producing x86 code. You might wonder a bit about the viability of the VAX as computing platform in the year 2005. Especially for something as cpu-bound as compiling. But don't worry, one of my new coworkers had as their current task evaluating whether this should be migrated to VMS/Alpha or to VMS/VAX running under a VAX emulator on x86-64!

Why did this company need to maintain a specific C compiler anyway? Well, they had their own ingenious in-house programming language that you could think of as an imperative Erlang with a Pascal-like syntax that was compiled to C source. I have no real data on how much code was written in that language, but it'd have to be tens of millions lines at a minimum.

The result of compiling this C code would then be run on an ingenious in-house operating system that was written in, IIRC, the late 80s. This operating system used the 386's segment registers to implement multitasking and message passing. For this, they needed the a compiler with much more support for segment registers than normal. Now, you might wonder about the wisdom of relying on segment registers heavily in the year 2005. After all use of segment registers had been getting slower and slower with every generation of CPUs, and in x86-64 the segmentation support was essentially removed. But don't worry, there was a project underway to migrate all of this code to run on Solaris instead.

September 02, 2015 09:11 AM


Planet FreeBSD

Planet Theory

TR15-144 | Explicit resilient functions matching Ajtai-Linial | Raghu Meka

A Boolean function on n variables is q-resilient if for any subset of at most q variables, the function is very likely to be determined by a uniformly random assignment to the remaining n-q variables; in other words, no coalition of at most q variables has significant influence on the function. Resilient functions have been extensively studied with a variety of applications in cryptography, distributed computing, and pseudorandomness. The best known balanced resilient function on n variables due to Ajtai and Linial ([AL93]) is Omega(n/(log^2 n))-resilient. However, the construction of Ajtai and Linial is by the probabilistic method and does not give an efficiently computable function. In this work we give an explicit monotone depth three almost-balanced Boolean function on n bits that is Omega(n/(log^2 n))-resilient matching the work of Ajtai and Linial. The best previous explicit construction due to Meka [Meka09] (which only gives a logarithmic depth function) and Chattopadhyay and Zuckermman [CZ15] were only n^{1-c}-resilient for any constant c < 1. Our construction and analysis are motivated by (and simplifies parts of) the recent breakthrough of [CZ15] giving explicit two-sources extractors for polylogarithmic min-entropy; a key ingredient in their result was the construction of explicit constant-depth resilient functions. An important ingredient in our construction is a new randomness optimal oblivious sampler which preserves moment generating functions of sums of variables and could be useful elsewhere.

September 02, 2015 07:55 AM


Native EFI Bootloader Support for OpenBSD

As the j2k15 hackathon comes to a close, OpenBSD gets its very own native EFI bootloader.

On Twitter, Yojiro UO (yuo@-san) posted a list of systems tested and working with the new bootloader.

As you can see, a number of EFI-only systems are now successfully booting OpenBSD.

And as seen on twitter Yasuoka@-san posted a little teaser with the dmesg from an EFI-only MinnowBoard

September 02, 2015 07:45 AM


starting android [on hold]

I have installed Android SDK studio and eclipse platform. The two have been integrated as per the instructions.

Now how to check if the integration was proper?

Basically how am I supposed to run a hello world program to check if everything is going well? What are the steps I need to follow.

by Snowbell at September 02, 2015 07:32 AM

doubt about phones present in the dictionary [on hold]

I am using KALDI speech recognition toolkit to train my system for speech recognition and I am using ILSL12 naming convention to represent various sounds.

I am facing a challenge in dictionary update:

When I am to break a word into phones (or individual sounds), then for all the complete sounds that are there in the middle of the word, I am adding an 'a' phone to represent complete sound.

However, when a similar complete sound occurs at the end of the word, then I am instructed not to put that 'a' sound at the end.

Is it fair?

For Example:

Suppose the word is nal (word has complete 'n' sound and complete 'l' sound) When breaking it into phones, I am writing

'n a l'

the a after n means that n has complete sound (as was already given). But l also had the complete 'l' sound; so why are we not writing nal as 'n a l a'?

by Snowbell at September 02, 2015 07:23 AM

recursive function as circular funciton

I recently learnt that a recursive function is also referred to as a circular function.

See, the recursive function cannot be a circular function because even if there is a call to function itself, it does not happen indefinitely. There is always a base case (or condition) from where the function is supposed to exit in one way or the other (either by displaying some message or returning some value).

So, is calling recursive function, a circular function not a misnomer ?

by Snowbell at September 02, 2015 07:05 AM



What is the difference between vim and vim-lite?

I want to install vim on my FreeBSD machine, and this other site suggests installing from FreeBSD ports like this:

# cd /usr/ports/editors/vim-lite/
# make install clean

This is all great, and I intend to install from the FreeBSD ports. However, I notice that there is also this other directory on my machine:


What is the difference between vim and vim-lite? And why would I want one versus the other? Obviously, vim-lite is "lighter"; but what am I sacrificing to have a lighter version?

by Sildoreth at September 02, 2015 06:31 AM



How to calculate yield spread?

I came across this multiple choice question on yield spread and I can't understand why the reasoning for the selected answer is correct.Can you confirm or clarify ?

( emphasis in the text is mine) enter image description here

The bond's market price of 103.73 can be computed by discounting its cash flows continuously at 4.0% per annum, which is represented by the flat yellow line. Specifically:

$3.00e^{-.04*0.5} + 3.00e^{-.04*1.0} + 3.00e^{-.04*1.5} + > 103.00e^{-.04*2} > = 103.73$.

The bond's same market price of $104.73 can also be derived by discounting the same cash flows according to the continuous discount rates given by the the steep blue line. The lower steep line, which shows a rate of 0.40% at six months, is actually two nearby curves: a swap rate curve and nearby spot rate curve.

Both start at 0.40% but, as the spot rate curve is slightly steeper, by year 2.0, the spot rate is 1.61% while the swap rate is 1.60%. For this purpose, we assume both the spot and are risk-free curves; e.g., US Treasury.

Each of the following is true about this bond EXCEPT which is false?

a) The bond's yield-to-maturity is 4.0%

b) The yield spread, represented by the solid red vertical arrow, is the difference between 4.0% (yellow line) and 0.40% (spot rate at six months)

c) If the price of the bond decreased due solely to perceived credit risk of bond (without any change in market risk), the upper curves (yellow and blue) would shift up

d) The z-spread, represented by the dashed red vertical arrow, is the difference between the (upper steep) blue line and the (lower steep) spot rate; e.g., 2.42% = 4.03% - 1.61%

The given answer is:

B. The yield spread is the "single-point" difference between the yield to maturity (YTM) and the corresponding riskless spot rate; in this case, yield spread ~= 4.0% - 1.61% ~= 2.39%. As the endpoints at year 2.0 are near to each other, this yield is only slightly less than the z-spread of 2.42%, due primarily to the fact that the upper blue line endpoint must necessarily be slightly greater than the 4.0% YTM. If the corresponding riskless swap rate is necessarily interpolated, then this yield spread is an i-spread

Isn't yield spread the difference in yield to maturity between the corporate bond and the risk free bond? In this question, 1.61 is the 2 year spot and not a yield to maturity. What am I missing?

by AfterWorkGuinness at September 02, 2015 06:27 AM



Given cash flows, what is the interest rate of the following contract? [on hold]

I am presented with an investment opportunity where I am given #481,000 on day 1. Thereafter, every 10 days, I am required to give back #50,000 every for 100 days (10 * 50000 = 500000).

How do I calculate the interest rate I am paying?

I am guessing I have to use the present value of annuity problem to find out the interest rate.

So, my present value is #481,000. My "annuity" is 50000 every 10 days. First payment is due on the 10th day. Last payment is due on day 100.

Plugging the above values in wolframalpha I get .7107% for interest rate. I divide that by 10 to get per day interest rate and multiply by 365 to get 25.94%

I am surprised to see the above answer. It is a lot more than 14% which is what the rate would be if I were to pay 500000 at the end of 100 days. Is my reasoning incorrect?

by newbie at September 02, 2015 06:06 AM


Javascript Reduce: Initial Value

I would like help in writing my version of Reduce Method. I need help when initial value is omitted. I can use the shift() method. But I believe the testers wants something more elegant.

var reduce = function (arr, iterator, start) {
  var result = start;
   arr.forEach(function(element, index) {
     result = iterator(result, iterator(start, element,index, arr))
   return result;

by Ju Juu at September 02, 2015 05:46 AM



How do I allow user permissions to data on a Couchpotato and Sickbeard Plugin using Freenas

Trying to setup permissions for the plugins sickbeard / couchpotato.

I’ve read almost every thread google / the forums have but haven’t really found a solution yet.
I’m assuming it’s the data here: (Jail name)/usr/pbi/xxxxxxxxxx
I tried:

chown -R guest:guest /usr/pbi/sickbeard-amd64/*  
chown -R guest:guest /usr/pbi/couchpotato-amd64/*  

But I couldn’t get it to work.
Or would I have to chmod 777 the folders?

by Supa at September 02, 2015 03:50 AM


How can I load text macros in emacs?

Here's what I want to do.

  1. Open emacs.
  2. I write a lot of emails during my work day. I want to have it so that I can do something like type "th", use a keystroke, and paste in "Thank you for your message." Alternatively, I type "ed", use a keystroke, and paste in "To do that, you'll want to hit the top right button..." At the end, I'll just save the file, and paste it in to the clipboard (I know how to do that part). You get the idea.
  3. I get the feeling I can do this using the init.el file to define functions, and then M-x to run the functions, but I'd like to see examples and a sample function of that type, to fill in those pieces.
submitted by moscowramada
[link] [6 comments]

September 02, 2015 03:38 AM

Wes Felter

"The problem of finding opportunities can be speedily solved by recursively applying a FIFO..."

“The problem of finding opportunities can be speedily solved by recursively applying a FIFO queue-based Bellman Ford algorithm to a graph of the negated logarithms of the ratios of the funded tips of all order books to find negative cycles and removing the smallest offer in each cycle on each recursion.”

- Donovan Hide on Exploiting Ripple Transaction Ordering For Fun And Profit

September 02, 2015 01:49 AM


arXiv Programming Languages

Program Synthesis using Natural Language. (arXiv:1509.00413v1 [cs.PL])

Interacting with computers is a ubiquitous activity for millions of people. Repetitive or specialized tasks often require creation of small, often one-off, programs. End-users struggle with learning and using the myriad of domain-specific languages (DSLs) to effectively accomplish these tasks.

We present a general framework for constructing program synthesizers that take natural language (NL) inputs and produce expressions in a target DSL. The framework takes as input a DSL definition and training data consisting of NL/DSL pairs. From these it constructs a synthesizer by learning optimal weights and classifiers (using NLP features) that rank the outputs of a keyword-programming based translation. We applied our framework to three domains: repetitive text editing, an intelligent tutoring system, and flight information queries. On 1200+ English descriptions, the respective synthesizers rank the desired program as the top-1 and top-3 for 80% and 90% descriptions respectively.

by <a href="">Aditya Desai</a>, <a href="">Sumit Gulwani</a>, <a href="">Vineet Hingorani</a>, <a href="">Nidhi Jain</a>, <a href="">Amey Karkare</a>, <a href="">Mark Marron</a>, <a href="">Sailesh R</a>, <a href="">Subhajit Roy</a> at September 02, 2015 01:30 AM

Reception Probabilities in 5G Vehicular Communications close to Intersections. (arXiv:1509.00399v1 [cs.SY])

Vehicular networks allow vehicles to constantly share information with each other and their surrounding, and are expected to be an integral part in future intelligent transportation system (ITS). However, the diversity of ITS applications in combination with the extremely stringent demands in particular posed by safety critical applications makes the design of vehicular communication a challenging task. 5G device-to-device communication as well as 802.11p are promising to address this challenge. In order to guide and validate the design process, analytical expressions of key performance metrics such as outage probability an throughput are necessary. In this paper, we focus on the intersection scenario, and present a general procedure to analytically determine the success probability of a selected link as well as system-wide throughput. We provide an overview of the salient properties of vehicular communication systems near intersections and show how the procedure can be used to model signal propagation conditions typical to different environments of practical relevance, for instance rural and urban scenarios. The results indicate that the procedure is sufficiently general and flexible to deal with a variety of scenarios, and can thus serve as a useful design tool for communication system engineers, complementing simulations and experiments.

by <a href="">Erik Steinmetz</a>, <a href="">Matthias Wildemeersch</a>, <a href="">Tony Q.S. Quek</a>, <a href="">Henk Wymeersch</a> at September 02, 2015 01:30 AM