# Planet Primates

## November 25, 2015

### StackOverflow

#### Tutorials For Natural Language Processing

I recently attended a class on coursera about "Natural Language Processing" and I learnt a lot about parsing, IR and other interesting aspects like Q&A etc. though I grasped the concepts well but I did not actually get any practical knowledge of it. Can anyone suggest me good online tutorials or books for Natural Language Processing?

Thanks

### StackOverflow

#### Scikit-learn performance drop after using FeatureUnion

I am training a classifier on text data. Without FeatureUnion, I get an accuracy of 90%. With it, accuracy drops to 20%...

The features used are n-grams as well as other features I extract myself. I'm new to scikit-learn and so at first I set up the code like so (evaluating the individual feature contribution). I get an accuracy of 90% on the test data.

print 'Evaluating individual contribution of N-GRAMS'
#dev/test_questions and dev/test_target have been extracted from files
#extract the text from the raw data set
dev = {'data': [q.normalized_text for q in dev_questions], 'target': dev_target}
test = {'data': [q.normalized_text for q in test_questions], 'target': test_target}

parameters = {'countvect__ngram_range': [(1,1), (1,2)],
'clf__alpha': (1e-3, 1e-4),
'clf__loss': ('hinge', 'modified_huber')
}

text_clf = Pipeline([
('countvect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())
])

gs_clf = GridSearchCV(text_clf, parameters, n_jobs=-1)
_ = gs_clf.fit(dev['data'], dev['target'])
predicted = gs_clf.predict(test['data'])

print accuracy_score(test["target"], predicted)
print classification_report(test["target"], predicted)
# ~90%


Now, since I need to combine multiple features, I use a FeatureUnion. I first defined a TextExtractor which takes raw Question objects and returns the text in it:

class TextExtractor(BaseEstimator, TransformerMixin):
""" Extract text from each question to be used for language modelling """
def __init__(self, normalized=False):
self.normalized = normalized

def fit(self, x, y=None):
return self

def transform(self, questions):
print 'TextExtractor> transforming '
if self.normalized:
return [q.normalized_text for q in questions]
else:
return [q.text for q in questions]


And to train the classifier, I use the following code. The accuracy drops to 20% on the same data set.

 pipeline = Pipeline([
# Use FeatureUnion to combine the features
('union', FeatureUnion(
transformer_list=[
# # WORD SHAPE <--- Example other feature I would include
# ('word_shape', Pipeline([
#     ('selector', WordShapeExtractor()),
#     ('vect', CountVectorizer(ngram_range=(1,5))),
#     ('tfidf', TfidfTransformer(use_idf=True, norm='l2')),
#     # ('best')  #TODO
# ])),

# N-GRAMS
('ngrams', Pipeline([
('extractor', TextExtractor()),
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
])),
],
)),

('clf', SGDClassifier()),
])

gs_clf = GridSearchCV(pipeline, parameters, n_jobs=1)
_ = gs_clf.fit(dev['data'], dev['target'])

predicted = gs_clf.predict(test['data'])

print accuracy_score(test["target"], predicted)
print classification_report(test["target"], predicted)
# ~20% on the same data set!


Am I using FeatureUnion wrong here?

Thanks

### CompsciOverflow

#### Proving "QUESTION" is NP-Complete by reduction from n-variable 3SAT

I'm struggling with a problem in my theory of computation course that asks us to prove "QUESTION" is NP-complete by reduction from n-variable 3SAT. I've done a number of other similar reductions but I keep getting stumped on this particular problem.

We define a question as a string over the alphabet $\{0,1, ?\}$ and say that a question covers all of the strings where substituting ?? with 0s and 1s yields a string such as $0??1$ covers the four strings $0001$, $0011$, $0101$, $0111$.

We have to reduce the 3SAT problem to QUESTION= {A : A is a set of questions, each of length $n$, such that there exists a string $w$ of length $n$ where no question in A covers $w$}. To show that it is NP-complete.

I recognize that some instance of QUESTION will "cover" less than $2^n$ strings but I'm pretty stumped on how to go forward as all of the things I've tried end up not working out.

### StackOverflow

#### Reverse list using map/reduce

I am learning concepts of functional programming and trying out problem exercises. An exercise, Reverse a list using map/reduce. My solution:

lists = [ 1, 2, 3, 5 ,6]

def tree_reverse(lists):
return reduce(lambda a, x: a + [a.insert(0,x)], lists, [])

print tree_reverse(lists)


Output:

[6, 5, 3, 2, 1, None, None, None, None, None]


I don't understand why there are Nones equal to number of elements in list.

EDIT: Extending question for case of nested lists.

 lists = [ 1 ,2 ,3 , [5,6,7], 8, [9, 0, 1 ,4 ]]


### QuantOverflow

#### Clarify a derivation in Pat Hagan's Convexity Conundrums

I am looking for help in understanding the algebraic derivation to go in between some of the lines in Pat Hagan's famous Convexity Conundrums paper e.g. how he goes from 3.4a to 3.5a.

### CompsciOverflow

#### denotional semantic for while - fixed points

In my book it is written: $$[[\text{while b do S}]] = \text{FIX F}$$, where $Fg = cond (\beta[[b]], g\circ [[S]], id).$
What is cond ? $[[\text{if b then S1 else S2}]] = cond (\beta[[b]], [[S1]], [[S2]])$.

I don't understand why we should use $FIX$. What it is FIX ? For me it is sufficient:
S[[\text{while b do S}]] = cond(\beta[[b]], [[\text{while b do S}]]\circ [[S]], id)$$DEFINITIONS$$S_{ds}[[\text{while b do S}]] = FIX F$$where Fg= \text{cond}(\beta[[b]], g\circ S_{ds}[[S]], id)$$FIX: ((State\rightarrow State)\rightarrow (State \rightarrow State)) \rightarrow (State \rightarrow State)In brackets functions are partial. ### StackOverflow #### List of all possible features in time series for classification for classification algorithms (i use those in python sklearn), the hardest and most time consuming part is always feature extraction. I'd like to have a full list of features we can extract from time series, or a list with the most important features. I think me and a lot of other people could achieve better classification results with more and better features. Features i have so far: ### Statistical Domain Features: • mean • modal • percentiles (quartil) • interquartil range • variance • variation • range between maximum and minimum • entropy ### Frequency Domain Features: *For most of them we have to sperate the frequence bands in parts like lowfreq, highfreq etc. • mean spectrum power* • mean frequency* • spectral entropy* • maximum frequency • fractal dimension ### Predictive Model Features • Parameters of ARMA / ARIMA models ### Data adaptive Model Features • Piecewise Aggregate Approximation / APCA • Symbolic Aggregate Approximation ### Other • mean time between maximas / minimas • mean maximas / minimas per x samples Tags for the features which describe for what classification domain they mostly promise good results would be cool too. If we get a more complete list, i could write a library which extracts all features, and selects the best with sklearn RFECV or genetic algorithms. Thank you for helping #### How to achieve master- master replication between more than two postgresql databases? I would like to be able to setup a master-master replication between more than two postgresql databases in the following way: Consider three databases namely db_main, db_1, db_2. There is a bi-directional replication setup (swap sync maybe?, in Bucardo terms) between db_main and db1 and another between db_main and db_2. While db_1 and db_2 are not even directly connected, if I create table1 on db_1 and table2 on db_2 then both table1 and table2 should propagate to all three databases. Is such a setup even possible? if yes how? what level of consistency? will the solution tolerate failures such as message loss due to network failure ? Thank you in advance. ### CompsciOverflow #### How to build a label flow graph for static analysis I'm new to world of static analysis and am trying to build a new analysis of C programs for llvm compiler. I've started with the build of the graph of the constraints of the program: The edges represent the flow of data through the program (according to the statements or function calls) and the nodes the run-time memory locations. I'm wondering if for the labeling of the constraints I do need a symbol table with all the constraints and their labels. I found out that we can build a CFG on top of the LLVM by only parsing the LLVM IR. for example, we could just have: //Build list nodes without successors. for (Function::iterator e = F.end() ; e != BB ; ++BB) { BI = BB->begin(); for(BasicBlock::iterator BE = BB->end(); BI != BE; ++BI){ Instruction * instruction = dyn_cast<Instruction>(BI); StaticAnalysis::ListNode* node = new StaticAnalysis::ListNode(counter++); node->inst = instruction; helper.insert(pair<Instruction*,StaticAnalysis::ListNode*>(instruction,node)); CFGNodes.push_back(node); } }  So, my question is: Would this be possible also for a flow graph? Or a symbol table is needed to construct one? ### TheoryOverflow #### Guessing an n-bit vector using semi-reliable parity function I'm stuck on this exercice from Computational Complexity by Arora & Barak (chapter 9). Suppose somebody holds an unknown n-bit vector a. Whenever you present a randomly chosen subset of indices S \subseteq \{1, . . . ,n\}, then with probability at least 1/2 + \epsilon, she tells you the parity of all the bits in a indexed by S. Describe a guessing strategy that allows you to guess a with probability at least (\epsilon/n)^c for some constant c > 0. It is not said whether the guessing procedure needs to run in polynomial time, but I assume so. Any hint or answer will be appreciated. ### Dave Winer #### Another test of replying to posts... Hello again. I have been working on the editor, and have implemented a few minor fixes and one major fix. It should look and feel quite a bit tighter now. So if you have a moment... 1. Log into the site choosing the last command in the menu at the right of the menu bar. If necessary. 2. Choose Reply from the popup menu next to my name, above. 3. Enter a little text, a Thanksgiving greeting perhaps. 4. Click on the text to edit it. 5. Click the thumb-up icon to Like this post. 6. Like your own reply! Please report any problems. Thank you! ### StackOverflow #### Why I would not need an ORM in a functional language like Scala? I'm wondering if I can switch from Java to Scala in a Spring + Hibernate project to take advantage of some Scala features such as Pattern Matching, Optional and what it seems to me a cleaner sintax in general. I've looking for the ORM by default in the Scala ecosystem and I've found thinks like Activate (but mostly I try to find if Hibernate can be used with Scala). Searching for this I've read this in the Play documentation about JPA + Scala. But the most important point is: do you really need a Relationnal to Objects mapper when you have the power of a functional language? Probably not. JPA is a convenient way to abstract the Java’s lack of power in data transformation, but it really feels wrong when you start to use it from Scala. I have not a deep understanding of the functional way of create a complete application (thats why I pretend to use Scala so that I can catching it incrementally because combines OO + Functional) so I can't figure out why I would not need an ORM with a functional language and what would be functional approach to tackle persistence of the domain model. A DDD approach for the business logic still makes sense with Scala, doesn't? ### Lobsters #### Interesting Response to Death of Dynamic Languages Article ### CompsciOverflow #### non-binary self balancing tree I'm looking for a tree data structure that allows to keep the tree balanced in high (minimum high as possible). I mean, suppose a tree where: • each node has a parameter k that is the maximum number of sons that can be attached to him, 0≤k≤N • all the operations will be about insert and delete (no search): it's just just important that every node know the son/s, I'm not interested at all in search (<0.1% of operations) Would an adaption of RB-tree or AVL tree be a good idea for that task or there are better solutions (other data structures, other kind of tree, etc)? ### StackOverflow #### ImportError: No module named arff Here's a simple code as in this link to read an arff file in python (the commented one didn't work too): import arff for row in arff.load('heart_train.arff'): print(row.sex)  And here's the error I receive: python id3.py Traceback (most recent call last): File "id3.py", line 1, in <module> import arff ImportError: No module named arff  "heart_train" arff file data is like: @relation cleveland-14-heart-disease @attribute 'age' real @attribute 'sex' { female, male} @attribute 'cp' { typ_angina, asympt, non_anginal, atyp_angina} @attribute 'trestbps' real @attribute 'chol' real @attribute 'fbs' { t, f} @attribute 'restecg' { left_vent_hyper, normal, st_t_wave_abnormality} @attribute 'thalach' real @attribute 'exang' { no, yes} @attribute 'oldpeak' real @attribute 'slope' { up, flat, down} @attribute 'ca' real @attribute 'thal' { fixed_defect, normal, reversable_defect} @attribute 'class' { negative, positive} @data 63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,negative 37,male,non_anginal,130,250,f,normal,187,no,3.5,down,0,normal,negative 41,female,atyp_angina,130,204,f,left_vent_hyper,172,no,1.4,up,0,normal,negative 56,male,atyp_angina,120,236,f,normal,178,no,0.8,up,0,normal,negative 57,female,asympt,120,354,f,normal,163,yes,0.6,up,0,normal,negative 57,male,asympt,140,192,f,normal,148,no,0.4,flat,0,fixed_defect,negative ...  ### CompsciOverflow #### How to fix pipeline data hazards in following MIPS instructions? MIPS instructions: add s2, s4, s1 sub t0, s2, s5 lw t1, 4(t0) add s3, s4, s1  A. Identify and explain any examples of pipeline hazards that you can find in this sequence of instructions. I think it has data hazards because second line depends on completion of data access by first line, and third line depends on completion of data access by second line. Am I right? B. Can you reorganize the instructions to eliminate the hazards? If so, how? Can this MIPS instructions be reorganized? How would I approach this? #### What software development method would be appropriate for a homomorphic encryption research project? Right now, I am playing with homomorphic encryption https://en.wikipedia.org/wiki/Homomorphic_encryption, namely I want to make a software that sorts a vector of encrypted integers and returns the sorted vector of encrypted integers, something like this: Comparison-Based Applications for fully homomorphic encrypted data http://www.acad.ro/sectii2002/proceedings/doc2015-3s/08-Togan.pdf. I have already begun to write some pieces of the software but along the road I have had some difficulties and althought it's a small project, I believe that I could use a software development method to make things easier ( speed the project development cycle). What software development methodology would be more appropriate for a research project : Scrum, XP, something else ? I have never dealt with research projects and now I am facing something new. My goal is to code an implementation of something similar to the application describe in the above reference, but with better parameter settings and timing performance, and based on a newer underlying scheme. #### What are the practical uses of ontologies? I have read many papers and books about ontologies and I am trying to figure out that how they are used in a real project? For example how the ontology for a soccer player robot can be defined and used with a cognitive architecture in order to make it intelligent? Are ontologies relations between terms in that domain of knowledge ( for example relation between the ball and foot word and physical rules definition and their relation with the foot and ball movement , ...) or relations between tactics , strategies and different mixtures of tactics ? Are there any clear examples of ontology usage in real projects and their combining usage with the cognitive architectures like ACT-R for augmenting the cognitive architecture? #### Performance of microkernel vs monolithic kernel A microkernel implements all drivers as user-space programs, and implements core features like IPC in the kernel itself. A monolithic kernel, however, implements the drivers as a part of the kernel (e.g. runs in kernel mode). I have read some claims that microkernels are slower than monolithic kernels, since they need to handle message passing between the drivers in user space. Is this true? For a long time, most kernels were monolithic because the hardware was too slow to run micro-kernels quickly. However, there are now many microkernels and hybrid kernels, like GNU/Hurd, Mac OS X, Windows NT line, etc. So, has anything changed about the performance of microkernels? Is this criticism of microkernels still valid today? ### Fefe #### Der überlebende russische Pilot bestreitet, dass die ... Der überlebende russische Pilot bestreitet, dass die Türkei vor dem Abschuss gewarnt hat. Der türkische Premierminister brüstet sich damit, den Abschuss persönlich befohlen zu haben. Ob der demnächst einen bedauerlichen Autounfall hat oder bei einem Privatjetabsturz ums Leben kommt? Übrigens, Lacher am Rande: Zwar wollten sich türkische Nationalisten vor dem russischen Konsulat in Istanbul versammeln und gegen Russlands Luftangriffe auf turkmenische Milizen im Norden Syriens protestiert. Es flogen sogar Eier. Sie trafen aber das benachbarte Konsulat der Niederlande. Die Demonstranten hatten die Fahnen verwechselt. Und deshalb ist Schulbildung wichtig, liebe Leser! ### DataTau #### Rotation Forest in Python ### Dave Winer #### Let Hoder talk to you for a bit... Did you read Hoder's piece about saving the web? He was in jail in Iran for six years, while the flow of the web was taken over by social media. That gave him a unique perspective on what was lost. If you haven't read it, and you love the web, please clear 15 good minutes and sit down with a cup of coffee or whatever you like to drink and listen to him and think. And yes, it is ironic that he put the piece on Medium, where they are hoping to do a bit more of the same unpleasantness to the web. But at least they support real hyperlinks unlike Twitter and Facebook. ### Lobsters #### OpenBSD support for psutil ### StackOverflow #### Converting words to their roots Is there an efficient way to convert all variations of words in a corpus (in a language you're not familiar with) to their roots? In English, for example, this would mean converting plays, played, and playing into play; did, does, done, and doing into do; birds into bird; and so on. The idea I have is to iterate through the less frequent words and test whether a substring of this word is one of the more frequent words. I don't think this is good because, first, it would not affect irregular verbs and, second, I'm not sure that it's always the "root" of the word that's going to be more frequent than some other variation. This method might also change some words erroneously that are totally different from the frequent word included in them. The reason I want to do this is that I'm working on a classification problem and figured I'd get better results if I worked better on my preprocessing step. If you've done anything similar or have an idea, please do share. Thank you! ### CompsciOverflow #### Transforming training data for machine learning algorithms If you want to make good predictions with machine learning (supervised learning in particular), you need a good training set. And relevant predictors in your feature set can be overshadowed by irrelevant training data. Validation can help you to figure out what training data is good - you can then remove irrelevant inputs. But what if "irrelevant" data contains something useful, but it just isn't obvious? I have an exercise schedule, and you can predict what kind of exercise - if any - I'll have for a given day based on the day of the week and my location. The same predictability is true for my friend - and just about anyone who exercises regularly. If I throw my friend's position and day of the week data into my prediction algorithm, then I may over fit because a) day of the week would be repeated, and b) my friend's position is not linearly correlated to my schedule, so it could be mistaken for irrelevant noise. Simply dumping his training set into my training set without any thought is probably bad. However, if my friend and I are in the same location for a given day, then we are very likely to exercise together, and I follow his schedule when we exercise together. It doesn't matter if my friend is across the city or across the world, his position only becomes strongly correlated when he is very close to me. So there is something "hidden" in my friend's seemingly irrelevant position data that might be missed. The general problem is, given a set of predictors \begin{align} P= \{{x_1,x_2,x_3,...}\} \end{align} for some data set \begin{align} P={\{y_i,x_{1i},x_{2i},x_{3i},...\}}_{i=1}^n \end{align} we want to predict \begin{align} \{ y_i \} \end{align} We can do this by using supervised machine learning techniques, and we can also try doing this by using linear regression: \begin{align} y_i = a_1 x_{1i} + a_2 x_{2i} + a_3 x_{3i} + ... + e_i \end{align} Now, consider the set of all subsets of P \begin{align} Let \space U = \{ U_j|\space U_j \subseteq P \} \end{align} And consider the set of nonlinear functions of all subsets of P that have nonzero correlation with y (this is messy notation but its just any function of any combination of predictors such that the function has nonzero correlation with whatever it is you are predicting). \begin{align} Let \space F = \{[\{f_{1j}(U_j)\},\{f_{2j}(U_j)\},\{f_{3j}(U_j)\},...]|\space \{corr(\{f_{kj}(U_j)\} \space with \space \{y_i\})\} \cap 0 = \varnothing \} \end{align} F is a massive set, and it contains the function for any given nonlinear machine learning technique. Furthermore, the "ultimate" nonlinear predictive function for any data set would be contained in F, and it may not be the conclusion of any machine learning technique. Clearly, the linear regression model and F share no elements by construction. However, they may share linear terms if linear term coefficients are ignored. Every element of F can be broken up into a sum of linear and nonlinear terms (even if there are no linear terms). Now, assume that the "ultimate" nonlinear predictive function contains linear and nonlinear terms such that nonlinear terms can be approximated by low order polynomials and have nonzero correlation with y - and therefore are elements of F. I propose that the addition of these low order polynomials and linear terms can approximate the "ultimate" nonlinear predictive function reasonably well. But that starts already knowing the "ultimate" function. Now, say we want to find this function and we keep the assumptions we made. Given a set of predictors, could we search through the set "F" by approximating low order polynomials, add these nonlinear functions to our set of predictors, then add and subtract elements to our set of predictors with an iterative technique that uses linear regression techniques to arrive at a good approximation for the "ultimate" predictive function? Are these assumptions reasonable for real world data sets? And, could this be a more general approach than machine learning techniques? I have tried implementing code that attempts to achieve this, and I seem to be able to generate new predictor sets that yield better predictions when used in KNN compared to analogous KNN predictions from the original predictor sets. So far I have generally been able to boost KNN predictive accuracy by 3-4% for all k values of various distance measures and metrics so long as data sets have reasonable distribution. But I would like to hear some opinions about this idea, and if it is reasonable approach or mathematically sound at all. I realize that when all possible functions in F are considered, this algorithm becomes huge and computationally very expensive. But I was thinking that maybe you should only check low order polynomials that are functions of reasonable numbers of features from your set of predictors, and only consider them for your model if they meet some performance conditions. Also, you probably don't need to double count functions optimized for the same set of inputs. So If you only considered low order polynomials of two features, you would only have to search through (n^2 - n)/2 instances, and low order polynomial optimization isn't too computationally expensive. So if you limit the number of features for the functions created in this algorithm, then it becomes much more realistic. But limiting the number of features for the functions could limit the algorithm's performance with more sophisticated data sets. There Is probably some optimal balance here. ### Lobsters #### Easy Forth - Interactive Ebook #### robotjs ### CompsciOverflow #### Understanding the proof of the halting problem I came across the following example that proves that the blank tape halting problem is not decidable. I understand the proof technique, but I just don't see how the blank tape problem is shown to be undecidable in the proof. It says M_{w} starts with a blank tape but then writes w onto the tape, so there is no difference to the general halting problem. How does this imply that the blank tape halting problem is not decidable? I can't find the logical conclusion. Couldn't we just replace the blank tape halting problem with another DECIDABLE problem and argue the same way? ### Lobsters #### Pyston talk recording ### Dave Winer #### Interview fake Trump I have a solution for TV re Trump. Instead of interviewing Trump himself, allowing him to talk over your challenges to his lies, just interview recordings of Trump, and interrupt him whenever you see fit. The way Jon Stewart used to do it on Daily Show. You don't need to give him, the person, actual air time. ### CompsciOverflow #### When do deadlocks occur? I was reading about deadlocks in Operating Systems. Where I came across two examples below. Circles with label P_x are processes. Squares with label R_x are resources. Each dot in the square represents single instance of resource type R_x. An edge from R_x to P_x means an instance of resource R_x is allocated to process P_x. An edge from P_x to R_x means the process P_x is waiting for getting an instance of resource R_x allocated. Now consider below two resource allocation graphs The example on left involves deadlock while the one on the right did not involved deadlock. I can understand that in right-side figure, if P_2 releases its instance of R_1, it can be assigned to P_1, breaking the circular wait. Or if P_4 release its instance of R_2, it can be assigned to P_3, breaking the circular wait. However we cannot break circular wait in left-side figure. While I can try out this on any given resource allocation graph and decide if there is deadlock or not, I want to know can we have a generalized rule for this which can tell what exactly it is which is contributing to the deadlock, especially in case of multiple instances of resources are there. I did not found any reference / book speaking of this clearly. So after a bit of thinking I came up with following fact: If there are multiple instances of same resource, for deadlock to exist, for any combination of two processes, if both are allocated an instance of same resource, then both should be a part of at least one cycle. In right-side figure above, there is no deadlock because • processes P_2 and P_3 are allocated instances of R_1, but they both are not part of any cycle • similarly processes P_1 and P_4 are allocated instances of R_2, but they both are not part of any cycle In left-side figure above, there is a deadlock because • processes P_1 and P_2 are allocated instances of resource R_3 and are part of same cycle,P_1-R_1-P_2-R_3-P_3-R_3-P_1 So am I correct with the above realization of fact? Or there are more aspects/conditions to the above fact (of when deadlock is present and when not) that I am missing? What I am asking is if there is any other condition which if met, instead of the above one, will still result in the deadlock (in the context of multiple instances of resources and apart from four classic conditions of deadlock: mutual exclusion, no preemption, hold and wait and circular wait)? #### On Ladner's theorem In Ladner's theorem, can we take \mathsf{NPI} problems to be problems in \mathsf{SUBEXP}\cap\mathsf{NP}? Is this a proper subset? Can we extend Ladner's theorem to interpret \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}\neq\emptyset hold? Does proper extension of Ladner's theorem indicate an analogous proper subset in \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}? #### Another version of the online set cover problem? Here is a note about online set cover problem: we are initially given the m sets, but we do not know which elements they contain. At any time t, we get a new element e_t and learn which sets contain e_t. We then have to irrevocably pick a set that will cover e_t if it is not already covered. The goal is again to minimize the number of sets we pick. However, I'd like to think about another version: we have a group of sets \mathcal{S} = \{ S_1, S_2, \cdots, S_m \} and a universal set U=\{ e_1, \cdots, e_n \}. U can be covered by \mathcal{S}. But we don't know what elments each S_i \in \mathcal{S} contains. Then S_1 comes and now we know what elements S_1 contains. Meanwhile, we have decide either you will use S_1 to cover U or you discard it. Then, S_2 comes, you know what elements S_2 contains and decide to keep S_2 or discard it. The process is repeated until the sets you keep can cover U. I want to ask if there is an algorithm works in the way I mentioned. Must we check all the S_i \in \mathcal{S} in order to get a solution of set cover? Or what is the expected number of sets that have to be checked in order to get a solution of set cover? ### Fefe #### Ein Psychologie-Professor der New York University berichtet ... Ein Psychologie-Professor der New York University berichtet von "dem Yale-Problem". Der Mann ist, was ich besonders großartig finde, weil es sie wieselig klingt, "Thomas Cooley Professor of Ethical Leadership" :-) Das Yale-Problem ist sein Euphemismus für die Geschichte mit den Halloween-Kostümen neulich. Er berichtet da, wie er an einer "elite private high school" an der Westküste der USA einen Vortrag gehalten hat. Es ging um die Frage, ob man an der Uni lieber ein Klima der Unterdrückung oder der Debatte von kontroversen Ideen haben will, sein Titel ist "Coddle U vs. Strengthen U" und er argumentiert da, das sei wie beim Immunsystem, das darf man nicht vor Keimen beschützen, sondern es wird stärker, wenn es ihnen ausgesetzt ist. So, und diesen Vortrag hält er jetzt an dieser High School. Dann passierte das hier: My talk had little to do with gender, but the second question was “So you think rape is OK?” Und dazu gab es Fingerschnipsen. Das ist jetzt anscheinend die neue Protestgeste bei solchen Fragestellungen. Er meint, das kann man auch auf dem Yale-Video hören, aber ist mir dort nicht aufgefallen. Aber der eigentliche Grund, wieso ich das hier blogge, sind diese beiden Absätze: After the first dozen questions I noticed that not a single questioner was male. I began to search the sea of hands asking to be called on and I did find one boy, who asked a question that indicated that he too was critical of my talk. But other than him, the 200 or so boys in the audience sat silently. After the Q&A, I got a half-standing ovation: almost all of the boys in the room stood up to cheer. And after the crowd broke up, a line of boys came up to me to thank me and shake my hand. Not a single girl came up to me afterward. Mir persönlich sind auch viele Männer in der Radfem-Ecke aufgefallen, aber ich habe es auch mit einer höheren Altersstufe zu tun, nicht mit High School (entspricht bei uns sowas wie 5.-10. Klasse). Er hat das dann mal mit den Eltern angesprochen, und die haben sich so geäußert: Their parents were angry to learn about how their sons were being treated and… there’s no other word for it, bullied into submission by the girls, with the blessing of the teachers. Das Interessante ist ja, dass es ihm gar nicht so um Gendergeschichten ging, sondern er hat gesehen, dass 100% des Lehrkörpers politisch eher nach links orientiert sind, und dass daher die Schüler überhaupt nicht lernen, mit konservativem Gedankengut umzugehen und dann mit Schock, Furcht, Ausgrenzung und Abstoßen reagieren. Dann äußert er sich zu dem Argument, dass das ja nur gerecht sei, wenn die weißen Männer mal auf die Fresse kriegen, die seien ja über Jahrhunderte die Unterdrücker gewesen: most students who are in a victim group for one topic are in the “oppressor” group for another. So everyone is on eggshells sometimes; all students at Centerville High learn to engage with books, ideas, and people using the twin habits of defensive self-censorship and vindictive protectiveness. And then… they go off to college and learn new ways to gain status by expressing collective anger at those who disagree. Ich fand dieses Essay sehr interessant und empfehle die Lektüre. ### StackOverflow #### How can I extract values from a String to create a case class instance in Scala I'm trying to extract values from a tokenised String and create an (optional) case class instance from it. The String takes the form of: val text = "name=John&surname=Smith"  I have a Person class which will accept both values: case class Person(name: String, surname: String)  I have some code which does the conversion: def findKeyValue(values: Array[String])(prefix: String): Option[String] = values.find(_.startsWith(prefix)).map(_.substring(prefix.length)) val fields: Array[String] = text.split("&") val personOp = for { name <- findKeyValue(fields)("name=") surname <- findKeyValue(fields)("surname=") } yield Person(name, surname)  While this yields the answer I need I was wondering: 1. Is there a more efficient way to do this? 2. Is there a more Functional Programming-centric way to do this? Some constraints: 1. The order of the name and surname fields in the text can change. The following is also valid: val text = "surname=Smith&name=John"  2. There could be other fields which need to be ignored: val text = "surname=Smith&name=John&age=25"  3. The solution needs to cater for when the text supplied is malformed or has none of the required fields. 4. The solution can't use reflection or macros. ### CompsciOverflow #### Difference between Bayesian Networks and Dynamic Bayesian Networks I'm studying Bayesian networks and want to clarify a couple of things with people who are more knowledgable in the area than me. As far as I understand it, a Bayesian network (BN) is a directed acyclic graph (DAG) that encodes conditional dependencies between random variables. The graph is drawn in such a way that the the distribution (dictated by a conditional probability table (CPT)) of a random variable conditioned on its parents is independent of all other random variables. I'm assuming that, by definition, both the structure (nodes and edges of the DAG) and the entries of the CPT in a BN assumed to be fixed in time. Now, I'm wondering about the distinction between BNs and Dynamic BNs (DBNs), specifically, where the dynamic term in a DBN arises from: Does this mean that the structure AND conditional dependencies between variables are time-varying? If so, is a BN with a fixed structure (DAG) but time-varying probabilities also considered a DBN (does this type of 'DBN' have a name)? I'm not sure if what I've said is correct. Please let me know if I went wrong anywhere or if there is a better way of thinking about this. #### Construct matching for half of the vertices, in linear time Suppose we have a graph G=(V,E) connected and K_{1,3}-free. Sumner proved that every claw-free connected graph with an even number of vertices has a perfect matching (so, it is maximum matching). Describe an algorithm to construct one matching of cardinality |V|/2 in G in time compelxity O(|V|+|E|). I thought it in this way: the BFS tree of G is acyclic so a topological sort (postorder traversal) on it may solve the problem (via https://en.wikipedia.org/wiki/Claw-free_graph). But, looking some examples, I realized that there may exist edges that don't belong to E. ### StackOverflow #### How do I perform Naive Bayes Classification when specifically using a Bayesian Belief Network? I've been writing a java library that I want to use to build Bayesian Belief Networks. I have classes that I use to build a Directed Graph public class Node{ private String label; private List<Node> adjacencyList = new ArrayList<Node>(); private Frequency<String> distribution = new Frequency<String>(); public String getLabel() { return label; } public void setLabel(String label) { this.label = label; } public List<Node> getAdjacencyList(){ return adjacencyList; } public void addNeighbour(Node neighbour){ adjacencyList.add(neighbour); } public void setDistribution(List<String> data){ for(String s:data){ distribution.addValue(s); } } public double getDistributionValue(String value){ return distribution.getPct(value); } }  Graph public class DirectedGraph { Map<String,Node> graph = new HashMap<String,Node>(); public void addVertex(String label){ Node vertex = new Node(); vertex.setLabel(label); graph.put(label, vertex); } public void addEdge(String here, String there){ Node nHere = graph.get(here); Node nThere = graph.get(there); nThere.addNeighbour(nHere); graph.put(there,nThere); } public List<Node> getNeighbors(String vertex){ return graph.get(vertex).getAdjacencyList(); } public int degree(String vertex){ return graph.get(vertex).getAdjacencyList().size(); } public boolean hasVertex(String vertex){ return graph.containsKey(vertex); } public boolean hasEdge(String here, String there){ Set<Node> nThere = new HashSet<Node>(graph.get(there).getAdjacencyList()); boolean thereConHere = nThere.contains(here); return (thereConHere); } }  I have a class that I use to keep track of the probability distribution of a data set public class Frequency<T extends Comparable<T>> { private Multiset event = HashMultiset.create(); private Multimap event2 = LinkedListMultimap.create(); public void addValue(T data){ if(event2.containsKey(data) == false){ event2.put(data,data); } event.add(data); } public void clear(){ this.event = null; this.event2 = null; this.event = HashMultiset.create(); this.event2 = LinkedListMultimap.create(); } public double getPct(T data){ int numberOfIndElements = event.count(data); int totalNumOfElements = event.size(); return (double) numberOfIndElements/totalNumOfElements; } public int getNum(T data){ int numberOfIndElements = event.count(data); return numberOfIndElements; } public int getSumFreq(){ return event.size(); } public int getUniqueCount(){ return event.entrySet().size(); } public String[] getKeys(){ Set<String> test = event2.keySet(); Object[] keys = test.toArray(); String[] keysAsStrings = new String[keys.length]; for(int i=0;i<keys.length;i++){ keysAsStrings[i] = (String) keys[i]; } return keysAsStrings; } }  as well as another function that I can use to calculate conditional probabilities public double conditionalProbability(List<String> interestedSet, List<String> reducingSet, String interestedClass, String reducingClass){ List<Integer> conditionalData = new LinkedList<Integer>(); double returnProb = 0; iFrequency.clear(); rFrequency.clear(); this.setInterestedFrequency(interestedSet); this.setReducingFrequency(reducingSet); for(int i = 0;i<reducingSet.size();i++){ if(reducingSet.get(i).equalsIgnoreCase(reducingClass)){ if(interestedSet.get(i).equalsIgnoreCase(interestedClass)){ conditionalData.add(i); } } } int numerator = conditionalData.size(); int denominator = this.rFrequency.getNum(reducingClass); if(denominator !=0){ returnProb = (double)numerator/denominator; } iFrequency.clear(); rFrequency.clear(); return returnProb; }  However, I'm still not sure how to hook everything up in order to perform classification. I was reading over a paper entitled Comparing Bayesian Network Classifiers to try and get an understanding. Let's say that I am trying to predict a person's sex based on the attributes height, weight and shoe size. My understanding is that I would have Sex as my parent/classification node and height, weight and shoe size would by my child nodes. This is what I'm confused about. The various classification nodes only keep track of the probability distribution of their respective attributes, but I'd need the conditional probabilities in order to perform classification. I have an older version of Naive Bayes that I wrote public void naiveBayes(Data data,List<String> targetClass, BayesOption bayesOption,boolean headers){ //intialize variables int numOfClasses = data.getNumOfKeys();//.getHeaders().size(); String[] keyNames = data.getKeys();// data.getHeaders().toArray(); double conditionalProb = 1.0; double prob = 1.0; String[] rClass; String priorName; iFrequency.clear(); rFrequency.clear(); if(bayesOption.compareTo(BayesOption.TRAIN) == 0){ this.setInterestedFrequency(targetClass); this.targetClassKeys = Util.convertToStringArray(iFrequency.getKeys()); for(int i=0;i<this.targetClassKeys.length;i++){ priors.put(this.targetClassKeys[i],iFrequency.getPct(this.targetClassKeys[i])); } } //for each classification in the target class for(int i=0;i<this.targetClassKeys.length;i++){ //get all of the different classes for that variable for(int j=0;j<numOfClasses;j++){ String reducingKey = Util.convertToString(keyNames[j]); List<String> reducingClass = data.dataColumn(reducingKey,DataOption.GET,true);// new ArrayList(data.getData().get(reducingKey)); this.setReducingFrequency(reducingClass); Object[] reducingClassKeys = rFrequency.getKeys(); rClass = Util.convertToStringArray(reducingClassKeys); for(int k=0;k<reducingClassKeys.length;k++){ if(bayesOption.compareTo(BayesOption.TRAIN) == 0){ conditionalProb = conditionalProbability(targetClass, reducingClass, this.targetClassKeys[i], rClass[k]); priorName = this.targetClassKeys[i]+"|"+rClass[k]; priors.put(priorName,conditionalProb); } if(bayesOption.compareTo(BayesOption.PREDICT) == 0){ priorName = this.targetClassKeys[i]+"|"+rClass[k]; prob = prob * priors.get(priorName); } } rFrequency.clear(); } if(BayesOption.PREDICT.compareTo(bayesOption) == 0){ prob = prob * priors.get(this.targetClassKeys[i]); Pair<String,Double> pred = new Pair<String, Double>(this.targetClassKeys[i],prob); this.predictions.add(pred); } } this.iFrequency.clear(); this.rFrequency.clear(); }  So I generally understand how the math works, but I'm not quite sure how I'm supposed to get things to work with this specific architecture. How do I calculate the conditional probabilities? Can somebody explain this discrepancy to me please? ### Lobsters #### Dell’s Tumble, Google’s Fumble, And How Government Sabotage Of Internet Security Works #### How our CSS framework helps enforce accessibility ### TheoryOverflow #### On Ladner's theorem In Ladner's theorem, can we take \mathsf{NPI} problems to be problems in \mathsf{SUBEXP}\cap\mathsf{NP}? Is this a proper subset? Can we extend Ladner's theorem to interpret \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}\neq\emptyset hold? Does proper extension of Ladner's theorem indicate an analogous proper subset in \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}? ### StackOverflow #### Incremental training of ALS model I'm trying to find out if it is possible to have "incremental training" on data using MLlib in Apache Spark. My platform is Prediction IO, and it's basically a wrapper for Spark (MLlib), HBase, ElasticSearch and some other Restful parts. In my app data "events" are inserted in real-time, but to get updated prediction results I need to "pio train" and "pio deploy". This takes some time and the server goes offline during the redeploy. I'm trying to figure out if I can do incremental training during the "predict" phase, but cannot find an answer. #### chaining (or mapping) Task containing a single data array to an array of Tasks Part of learning Fanatasy Land/Folk Tale has lead me to creating some code. I am essnetially scnaning my network (via someLib) and uploading the results to a mongo repository. The scan returns back an array of results, while the upsert into mongo needs to work on results independently (mongoose - this is my first time with this lib too, so I may be mistaken there). In a traditional promised base model I would // step 0: setup const someLibPomise = makePromiseOf(someLib) //set 1: get data const dataArray = yield someLibPomise() //set 2: convert to array of promises to upsert const promiseArray = _.map(dataArray, makeUpsertPromise) //step 3: wait on results const upsertResults = yield promiseArray  In and of itself, this is a pretty clean representation, but I want push my comprehension of these functional techniques. My working version leaves a little bit to be desired, as I cant seem to get from the Task returned by the someLibTask function, which contains an array of objects TO an array of Tasks representing the individual upserts. I feel there must be a better way here is what is working: // step 0: setup const someLibTask = Async.liftNode(someLib) const cleanUpData = (dataArray) => { return _.map(dataArray, (data) => { // cleanup data object return data }) } const upsertTask = (collection) => { return (criteria, record) => { return new Task( (reject, resolve) => { const callback = (error, data) => { if (error) reject(error) else resolve(data) } collection.findOneAndUpdate(criteria, record, {upsert: true}, callback) }) } } const persist = (data) => { mongoose.connect('mongodb://localhost/db'); const someUpsert = adapt.upsertTask(Some.collection) const tasks = _.map(data, (record) => { const criteria = { "id": record.id } return serverUpsert(criteria, record) }) return Async.parallel(tasks).fork(console.error, process.exit) } // step 1: make a query and return an array of objects // () => Task(object[]) const dataTask = someLibTask().map(cleanUpData) // step 2: for the results to error log or persist method // (d) => (), (d) => () dataTask.fork(console.error, persist)  Ideally i can chain (or map) the results from the dataTask into persist, which converts that individual task to an array of upsert tasks. Which I can then wait on. I would love to see something like: // step 1: make a query and return an array of objects const dataTask = someLibTask().map(cleanUpData) // step 2: chain the results into upsert const upsertTasks = dataTask.chain(persist) //step 3: wait on the upsert tasks, and then log results Async.parallel(upsertTasks).fork(console.error, process.exit)  #### Error while training Prediction.io in naive classification? Update Dec 7: I realized I need to change the files in my prediction app template. Can someone help me with creating a template for classification for strings using predictionio !! So I am trying to train my model. I use the following code to add data: import predictionio client = predictionio.EventClient( access_key='VQGpZ8NnhdQOnRn1Qtg0zOZC4Exium5RvkFIplv7zNODMTs1uDm29rgxOdsMItlq', url=' filename="docs.txt" lines = open(filename).read().splitlines() count = 0 for l in lines: l = l.split(',') plan = l[0] if len(plan) > 2: att = l[1].strip().split(' ') print plan print att client.create_event( event="set", entity_type="user", entity_id=5, properties= { "plan" : str(plan) } ) i=0 for x in att: a = "attr"+str(i) client.create_event( event="set", entity_type="user", entity_id=5, properties= { a : str(x) } ) i =i +1 count = count + 1 print count if count > 10: break  Here the docs.txt file has: soy-oil, cwt call averag enter oct report corn matur level nation januari agricultur price wheat rate sorghum depart reflect cover loan reuter lb februari barley releas feedgrain juli reserv grain avg oat barley, cwt call averag enter oct report corn matur level nation januari agricultur price wheat rate sorghum depart reflect cover loan reuter lb februari barley releas feedgrain juli reserv grain avg oat  I use the first word as plan and the rest words as attributes. on executing pio train, I get the error: Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost): org.json4s.packageMappingException: Do not know how to convert JString(soy-oil) into double ### Lobsters #### Macbook charger teardown: The surprising complexity inside Apple's power adapter ### CompsciOverflow #### Algorithm: making querries on trees [on hold] You are given a tree with N (1 \le N \le 10^5) vertices and N - 1 edges. Weight of every edge won't exceed 200. Design an algorithm to do Q (1 \le Q \le 10^5) operations of two types as fast as possible. First type: given three numbers v_i, d_i, r_i, place r_i objects in vertex v_i and in every vertex whose distance from v_i does not exceed d_i. Second type: given vertex v_i tell how many objects are in v_i. I am preparing for programming/algorithmic competition and I got this task from my teacher. I solved many problems concerned with graph theory, but I got completely stuck on this one. I have no idea other than bruteforce. ### Lobsters #### Is it Pokemon or Big Data ? ### DataTau #### Not Even Scientists Can Easily Explain P-values #### Is it name of Pokemon or Big Data tool? ### StackOverflow #### multi view learning for text related problems What is multi view learning ?.what are common multi view learning scheme adapted to text related problems ?. ### CompsciOverflow #### How to add a new node in pastry algorithm [on hold] when there is a given network of nodes then how the routing table of each node is created/intialized 1st time. i want to ask that how the routing table of a node is initialized when it is added into the logical ring. i have read about this algorithm from-Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems by Antony Rowstron1 and Peter Druschel2 and 2 video lect. on youtube by wandida.com. – #### What are the theoretical and practical contributions of Multiagent Systems to science? Speaking about multiagent systems (MAS) is about as fuzzy as talking about artificial intelligence systems (AI). They are in essence the distributed counterpart of AI. While there are no so-called "AI theorem", AI research had given rise to many subfields, algorithms and scores of theorems (e.g. game solving, fuzzy logic, expert systems, A*, logical programming... as well as Bayesian networks and constraint satisfaction). But I fail to see a similar impact from MAS. As far as I know all the subfields related to MAS preexisted them. For instance: • results about distributed computing (e.g. Fischer Lynch Paterson theorem, replication and load balancing strategies, decentralization, resilience, distributed algorithms...) • results from operation research (e.g. makespan measure in scheduling) • results about voting (e.g. Arrow's theorem in social choice theory) • results about competitive systems (e.g. Nash equilibrium in game theory) • results about interoperability (e.g. ontologies in natural language processing) As far as I have seen "original" MAS contribution consist in the straightforward distribution of well known problem solving algorithms into distributed ones, whose most notable change seem to be at the epistemological level. When the problem is decomposable, distribution actually consist in allocating subproblems to different agents: • e.g: constraint satisfaction -> distributed constraint satisfaction: most notables change: variable now belong to agents, algorithms are unchanged. When the problem is not decomposable, distribution consist in replicating problems at the level of each agent, or having a central agent solve it: • e.g. reinforcement learning -> distributed reinforcement learning: agent apply independantly from each other the standard RL algorithm. • e.g. transport problem -> standard transport problem in operation research (no distribution) The only really original MAS algorithm I can think of is the Contract Net Protocol, which is in essence just a broadcasting algorithm. The only design constraint introduced by MAS that I can think of is privacy. Mutlirobots systems, often given as an example of MAS, have been developping from standards robotics largelly ignoring MAS literature. Therefore, what are the original contributions of MAS? Corrolary question: Why are they relevant as a standalone research field rather than being a common placeholder name for different fields preexisting them? ### Lobsters #### Angular 2, Immutability and Encapsulation ### CompsciOverflow #### On Ladner's theorem In Ladner's theorem, can we take \mathsf{NPI} problems to be problems in \mathsf{SUBEXP}\cap\mathsf{NP}? Can we extend Ladner's theorem to interpret \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}\neq\emptyset hold? #### How to calclate Avd() value to test the traceability attack In this paper what is the value of Avd(ID0, ID1) that's mean this protocol is susceptible to traceability Attack? And how to caclate it? ### StackOverflow #### How can Latent Semantic Indexing be used for feature selection? I am studying some machine-learning and I have come across, in several places, that Latent Semantic Indexing may be used for feature selection. Can someone please provide a brief, simplified explanation of how this is done? Ideally both theoretically and in commented code. How does it differ from Principal Component Analysis? What language it is written in doesn't really worry me, just that I can understand both code and theory. ### Planet Emacsen #### Irreal: Emacs for the CEO Back when Josh Stella was coding, he lived in Emacs. Like many of us, he performed most of his everyday tasks" class="wp-smiley" style="height: 1em; max-height: 1em;" />mail, calendar, documents, coding" class="wp-smiley" style="height: 1em; max-height: 1em;" />from within Emacs. Decades later, he'd become the CEO/co-founder of Luminal and had left Emacs behind. Like many developers" class="wp-smiley" style="height: 1em; max-height: 1em;" />even former developers" class="wp-smiley" style="height: 1em; max-height: 1em;" />he hates context switching, and that's what he found himself doing as he moved from application to application as he went about his day to day duties as a CEO. Each application had it's own UI and its own set of shortcut keys. Recently, he decided to revisit Emacs and to try to do as many of his daily tasks as possible from within Emacs. Stella describes his new set up and writes about why other CEOs might want to try Emacs too. It's not for everyone, he admits, but if you're the right sort of person, Emacs can revolutionize your work flow and make you more efficient and happier. None of that will come as a surprise to us Emacsers, of course, but I wonder how many CEOs without a technical background will be willing to climb up the learning curve to get those benefits. To make that climb a bit easier, Stella spends some time describing how to install Emacs and goes over some of the basic navigation. If you're looking for reasons why a non-technical person might want to try Emacs, Stella's post is an excellent place to start. ### TheoryOverflow #### Embedding distortion under group quotient The high level question is as follows: Suppose some group (here assumed to be a vector space of \mathbf{F}_2^n) has a low-distortion embedding into l_1. Under what condition does the quotient of the group by a (normal) subgroup inherit this low-distortion embedding? Formally: Let G be a group with a Cayley graph {\cal G} = Cay(G,S) for some generator set S\subseteq G, that can be embedded into the hypercube \mathbf{F}_2^n with low-distortion, i.e. there exists i: G \hookrightarrow \mathbf{F}_2^n such that for any x,y\in V({\cal G}), \frac{1}{c} d_{\cal G}(x,y) \leq \Delta(i(x),i(y))\leq c d_{\cal G}(x,y), where \Delta(a,b) is the Hamming distance between a,b\in \mathbf{F}_2^n, and d_{\cal G}(x,y) is the length of a shortest path connecting x,y in {\cal G}. Suppose now that N\subseteq G is a (normal) subgroup of G. Is there a choice of generators S_N\subseteq G / N, for which the quotient group G/N has a low-distortion embedding into l_1 as well ? #### Does solving matrix multiplication in quadratic time imply that SETH is false? I have a little conjecture that if you could perform matrix multiplication (or solve 3-clique) in O(n^2 \log(n)) time, then you could solve CNF-SAT in O(2^{(1-\epsilon)n}) time. In other words, more efficient algorithms for matrix multiplication would imply more efficient algorithms for SAT refuting the strong exponential time hypothesis (SETH). Questions: Has anyone every thought about connecting the hardness of matrix multiplication to SETH? Secondly, does anyone think that there is or isn't a relationship? Why or why not? ### CompsciOverflow #### Show that the problem L_{complement} is not decidable I'm really new to computability theory and have the following problem. L_{complement} : Given two turing machines M_1 and M_2, is L(M_1) = \overline{L(M_2)} decidable? I have really no clue where to start. My intuition tells me that the problem is undecidable and that I have to somehow reduce the halting problem to this problem in order to show that. My actual problem is that I have never done any proof like this and just don't know where to begin. Could you please give me some hints on how I should start? I'd really appreciate it if you could give me some help. #### Seems like NP cannot equal coNP by the definition of NP A yes answer to an NP problem must be deterministically verifiable in polynomial time. The complement is that the no answer must be similarly verifiable. If the problem is NP-complete, there will generally be an intractable number of possibilities. How can it be possible to deterministically verify the no answer for all possibilities in polynomial time? As an example, would it not be necessary to address every non-empty subset to verify a no answer for the subset sum problem? ### TheoryOverflow #### On ladner's theorem If Ladner's theorem is true, can we take \mathsf{NPI} problems to be problems in \mathsf{SUBEXP}\cap\mathsf{NP}. If Ladner's theorem is true then does \mathsf{SUBEXP/poly}\cap\mathsf{NP/poly}\neq\emptyset hold? #### Complexity of a problem over acyclic context-free grammars Let G be an acyclic, context-free grammar over a fixed alphabet \Sigma=\{a_1,\dots,a_k\} with the restriction (without loss of generality) that |w|=2 for each rule A\to w in the grammar. Acyclic means that if N is the set of nonterminals, then\{(A,B)\in N^2\mid A\to xBy\text{ is a rule in }G\text{; }x,y\in(\Sigma\cup N)^*\}$$is an acyclic relation. So L(G) is finite. Let in this setting the size of a grammar be defined as the number of nonterminals My question Let \#_w(i,j) be the number of different subsequences of a_ia_j in w. For example w=a_1a_1a_2a_2a_1 yields \#_w(1,2)=4, \#_w(2,1)=2, \#_w(1,1)=3 and \#_w(2,2)=1. Now I am looking for the complexity of: Given: Acyclic, context-free grammar G and numbers n_{i,j} with i,j\in\{1,\cdots,k\} Problem: Is there a word w\in L(G) with \#_w(i,j)=n_{i,j} for all i,j\in\{1,\cdots,k\} ? Backround / Well-studied, related problem The following problem is one step behind my question. In my question, the number of occurences of all subsequences of length 2 are given. We could first ask, what is the complexity, if the given numbers are the occurences of subsequences of length 1, i.e. the given numbers are the occurences of the alphabet symbols. So, let \#_w(i) denote the number of occurences of a_i in w\in\Sigma^*. The following problem is known to be \mathsf{NP-complete}: Given: Acyclic, context-free grammar G and numbers n_1,\dots ,n_k Problem: Is there a word w\in L(G) with \#_w(i)=n_i for all i\in\{1,\cdots,k\} ? McKenzie and Wagner ("THE COMPLEXITY OF MEMBERSHIP PROBLEMS FOR CIRCUITS OVER SETS OF NATURAL NUMBERS") provide an \mathsf{NP} algorithm to solve the membership problem of circuits over the natural numbers with \cup and +. A slightly modified algorithm solves our problem. The algorithm in short: In addition to the given numbers, we guess for each nonterminal how often it occurs in a derivation tree and for each rule how often it is applied in a derivation tree. Afterwards, we check whether there is a derivation tree of G satisfying the guessed numbers by checking some relations between those numbers. The problem should also be \mathsf{NP-hard}, e.g. as a conclusion of Kopczyński and Widjaja To ("Complexity of Problems for Parikh Images of Automata"). Still \mathsf{NP} ? Is the problem for subsequences of length 2 still solvable in \mathsf{NP} ? Of course it is at least as hard as the related problem for subsequences of length 1, because$$\#_w(i,i)=\frac {\#_w(i)\cdot(\#_w(i)-1)}{2}.$$Unfortunately, I am neither able to extend the algorithm of McKenzie and Wagner to get an \mathsf{NP} algorithm, nor can I show another hardness result like \mathsf{coNP} hardness. Thanks for any help. ### UnixOverflow #### Cant boot FreeBSD 10.1 using grub2 I installed FreeBSD 10.1 on a machine that has Xubuntu already. The installation completed successfully but im not able to boot it. On my grub boot screen i get only 2 options: ubuntu and advanced options. In particular i entered the Xubuntu partition to see whats going on. If i do: os-prober i get: /dev/sda1:unknown Linux distribution:Linux:linux /dev/sda2:unknown Linux distribution:Linux1:linux  This is my fdisk -l output: root@diet-atlante:/home/user# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 testine, 63 settori/tracce, 121601 cilindri, totale 1953525168 settori Unità = settori di 1 * 512 = 512 byte Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Identificativo disco: 0x2e70fb76 Dispositivo Boot Start End Blocks Id System /dev/sda1 1134323757 1448896490 157286367 a5 FreeBSD Partition 1 does not start on physical sector boundary. /dev/sda2 * 1448896554 1953525104 252314275+ a5 FreeBSD Partition 2 does not start on physical sector boundary. /dev/sda4 2048 1134323711 567160832 5 Esteso /dev/sda5 4096 1002164223 501080064 83 Linux /dev/sda6 1002166272 1022167039 10000384 82 Linux swap / Solaris  So i guess grub doen't recognise the FreeBSD partition. Any solutions? Thanks ### TheoryOverflow #### Natural NP-complete problems with high density? (This question is related to a previous one, see the discussion in "Almost easy" NP-complete problems, but it may also be of independent interest, so I post it as a separate question.) Let us say that a language L\subseteq \{0,1\}^* has high density, if it contains a positive constant fraction of all n-bit strings. That is, there is a constant c>0, such that$$|L\cap \{0,1\}^n|\geq c2^n$$holds for all n. It is not hard to construct artificial examples of NP-complete problems with high density. For example, let L be any NP-complete language. For a binary string z, let w(z) denote the weight of z, which is the number of 1-bits in z. Now define$$ L'=\{xx\,|\,x\in L\}\cup \{y\,|\, w(y)\:\mbox{is odd}\}. $$It is easy to see that L' has high density, and it still remains NP-complete. The above example, however, is quite artificial, it is constructed for the sole reason of exhibiting this property. I expected that one could easily find natural NP-complete problems with high density, and I was surprised that this turned out harder than I thought. So, the question is: What are some examples of natural NP-complete problems with high density? Edit: From the discussion in the comments I realized that a better question would be this: What are some examples of natural NP-complete problems with the property that both the yes-instances and the no-instances have high density? ### Lobsters #### Your Mouse is a Database ### QuantOverflow #### Example of optimal delta hedging in G. Barles, H.M. Soner option pricing paper There is a paper by Guy Barles and Halil Mete Soner about option pricing with transaction costs. And there is a section about optimal (delta) hedging, which I do not fully understand. The conclusion of optimal hedge section states that: So what is g() here? I couldn't find it's definition previously. Can someone give a real (numeric) example of calculations according to this model? I mean something like: we sell a call/put with strike S, it's delta is d, T days to expiration, etc. And after calculation we get the interval [y1, y2]. ### CompsciOverflow #### Patriot Missile Software Bug (Range Gate Calculation) Okay so I was reading about the Patriot Missile Software Bug, where to calculate the predicted range gate the missile system relied on the time keeping, however it would convert the whole/integer time number it stored (say 3600000) to 24 bit multiplying it by 1/10 (0.00011001100110011001100) to produce the time in real format in seconds, this way 'truncating' the actual time it computed before the converstion to a 'slightly smaller number' due to insufficient precision of 24 bit limitation. My question is not about the conversion or binary manipulation but rather how was the range gate calculated after taking the new calculated/truncated time into account for the purposes of establishing a 'distance/location/space' of where to look for the coming missile? Say the missile velocity is 2km/s, the time it was detected is at 100hrs. ### StackOverflow #### Pointing to source file from IDLE editor in python I'm working from the book called "Building Machine Learning Systems with Python". I've downloaded some data from MLComp to use as a training set. The file I downloaded (20news-18828) is currently in /Users/jones/testingdocuments/379 The book instructs code as follows: import sklearn.datasets MLCOMP_DIR = r"D:\data" data = sklearn.datasets.load_mlcomp("20news-18828", mlcomp_root=MLCOMP_DIR) print(data.filenames)  I've tried changing MLCOMP_DIR = /Users/jones/testingdocuments/379 and various combinations thereof, but cannot seem to get to the file. What am I doing wrong? ### Lobsters #### s2n and Lucky 13 ### QuantOverflow #### Innovative ways of visualizing financial data Finance is drowning in a deluge of data. Humans are not very good at comprehending large amounts of data. One way out may be visualization. Traditional ways of visualizing patterns, complexities and contexts are of course charts and for derivatives e.g. payoff diagrams, a more modern approach are heat maps. My question: Do you know of any innovative (or experimental) ways of visualizing financial and/or derivatives data? #### Density forecast of a GARCH model I am currently working on developing a series of density forecasts and I am encountering some problems. I am working on weekly S&P 500 returns and the returns process is described as r_{t} = \mu + \delta r_{t-1} + h_{t}z_{t} where z_{t} comes from the Gaussian distribution. I am forecasting the returns and volatility of the series using the ARMAX-GARCH-K Toolbox in Matlab. Initially I estimate the ARMA(1,0)-GARCH(1,1) model and obtain the one-step ahead forecast of the returns and volatility. I obtain the \mu parameter from the parameters of the GARCH model. As far as I understand the next step to obtain the density forecast (assuming Gaussian distribution) is to use the pdf of Gaussian distribution, so the normpdf(x, mu, sigma) function in Matlab. And here is the essence of my trouble. To obtain the density forecast should I use the actual observed return as x or the point forecast from the garch model? And should I use the \mu parameter from the GARCH model as input for the mu parameter of normpdf and the forecasted volatility as as sigma in the normpdf function? I now this may be a basic question but I cannot find any elementary examples on the internet. Kind regards ### CompsciOverflow #### If two languages together cover all words and one is regular, is the other one as well? If L_1$$\subseteq $\Sigma^*$, $L_2$$\subseteq \Sigma^* , L_1 is regular and L_1$$\cup$ $L_2$ = $\Sigma^*$ then is $L_2$ necessarily regular?

I think that the answer is yes, but I'm not sure on my proof.

The reason that I think that $L_2$ is regular is because surely $L_2$ just accepts all the words in the language that $L_1$ doesn't? So, to me, that suggests that $L_2$ must be regular as well, I just don't know where to begin on a formal proof.

Any guidance would be appreciated.

#### Do Self Types make the Calculus of Inductive Constructions obsolete?

Self Types are an extension of the Calculus of Constructions [1] that allow the language to express algebraic datatypes encoded through the Scott Encoding. The Scott Encoding provides one the ability to pattern-match in O(1), which is one of the main motivators for the inclusion of inductive definitions on CC. Yet, Self Types are make for a much simpler and elegant base theory, and are seemingly no less powerful.

Do Self Types, under a theoretical point of view, make CIC obsolete, or is there still some aspect on which CIC is favorable in relation to Self Tyes?

#### How to count derangements? [migrated]

I was going through this link.Suppose that there are n persons who are numbered 1, 2, ..., n. Let there be n hats also numbered 1, 2, ..., n. We have to find the number of ways in which no one gets the hat having same number as his/her number. Let us assume that the first person takes hat i. There are n − 1 ways for the first person to make such a choice. There are now two possibilities, depending on whether or not person i takes hat 1 in return:

A) Person i does not take the hat 1. This case is equivalent to solving the problem with n − 1 persons and n − 1 hats: each of the remaining n − 1 people has precisely 1 forbidden choice from among the remaining n − 1 hats (i's forbidden choice is hat 1).

B) Person i takes the hat 1. Now the problem reduces to n − 2 persons and n − 2 hats.

From this, the following relation is derived: !n = (n - 1) (!(n-1) + !(n-2))

I have a hard time understanding point B and the equation derived. I was trying to understand with an example of 4 people named A, B, C and D and the corresponding hats A, B, C and D. Everyone of them are supposed to wear any hat except their own matching hat i.e. A can wear B, C or D.

If suppose A wears B hat so we will be left with only 3 people and 3 hats. So I can understand "(n-1)!(n-1)" part but what i didn't understand is part B(marked above). Can anyone help?

### StackOverflow

#### Example of SOM and Neural Gas using Kohonen's python library

I'm trying to use Kohonen's python library in order to obtain a number of cluster by Neural Gas and SOM analisys, but I can not find any examples on the Internet.

My input would be a matrix where each row corresponds to a time instant and the columns are the position of a expecific variables in the space (Sea level pressure, geopotential). I would like to obtain the most representative clubsters with both analisys and also I would like to associate all the spatial fields to one of the groups to perform my clasification.

I would be very grateful if someone could provide a similar example that I mentioned above.

Thank you very much

P. S. Sorry for my English

#### How to use underscore.js as a template engine?

I'm trying to learn about new usages of javascript as a serverside language and as a functional language. Few days ago I heard about node.js and express framework. Then I saw about underscore.js as a set of utility functions. I saw this question on stackoverflow . It says we can use underscore.js as a template engine. anybody know good tutorials about how to use underscore.js for templating, especially for biginners who have less experience with advanced javascript. Thanks

#### Unsupervised clustering with unknown number of clusters

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other less than a threshold "T".

I do not know how many clusters exist. At the end, there may be individual vectors existing that are not part of any cluster because its euclidean distance is not less than "T" with any of the vectors in the space.

What existing algorithms / approach should be used here?

Thanks Abhishek S

#### Is the IO monad useful in Javascript [closed]

In Javascript it seems to be very natural to perform IO operations wherever you want. While I understand the purpose of the state monad (share changeable state) or the reader monad (share read only environment), I can't say that about the IO monad:

var IO = function (f) { // strictly speaking, this is a functor, not a monad
this.f = f;
}

IO.of = function (x) {
return new IO(function () {
return x;
});
}

function compose(f, g) {
return function (x) {
return f(g(x));
}
}

IO.prototype.map = function (f) {
return new IO(compose(f, this.f));
}


DrBoolean describes those benefit in his functional programming guide as follows:

Our pure code, ..., maintains its innocence and it's the caller who gets burdened with the responsibility of actually running the effects.

Although this is very helpful, I would like to have more specific examples. Are there use cases that can be implemented in Javascript solely with the monadic IO interface? Or is its use merely a matter of preference?

### DragonFly BSD Digest

The release candidate for DragonFly 4.4 is built and available for download.  The main site has it as an ISO or IMG file, and the mirrors should have it soon if not already.

Here’s a question I need feedback on: if we compressed these images using xz instead of bzip2 – would that inconvenience you?

### Fefe

#### Russlands Position mit dem abgeschossenen Flieger ist, ...

Russlands Position mit dem abgeschossenen Flieger ist, dass der Flieger dort die Öllieferungen von ISIS an/über die Türkei geprüft hat, und die Türkei wollte verhindern, dass die Russen die Öllieferungen ihrer Terrorbuddies von ISIS aufdecken.

### UnixOverflow

#### Cannot install PC BSD

Currently I am using Ubuntu. I wanted to try PC BSD so downloaded 10.2. On trying to install it on a separate primary partition(so that I can dual-boot) I am getting the following error.
Do anyone know how to fix it?

### QuantOverflow

#### How do right-to-break clauses affect CVA calculations

Does the presence of a optional/mandatory right-to-break clause affect CVA calculations, and if so, how?

Given two (otherwise identical) 10y swaps with the same counterparty, one of which has a right to break at 5y (ours), intuitively I'd say the one with the break clause should have a lower CVA - if the counterparty default spread takes a dive we can exit the trade at replacement cost. The question is how do we take that into account when calculating CVA?

Definitions:

• For the banks I deal with, when a break event occurs (regardless of whether the break was optional or mandatory) the trade is marked to market, a payment is made to whichever party is in the money & the trade is ripped up.

• Trades with a mandatory break are often "replaced" with a similar trade. That might be because a counterparty isn't allowed to have a 20Y trade on their books, for example.

### StackOverflow

#### Table normalization using functional dependency diagrams

We just learned how to do table normalization using functional dependency diagrams. I'm kind of confused and not sure if I'm doing it right. Here are two diagrams. The first is the one we had to normalize, the seconds is after 2nf and 3nf. Is it correct? Or where am I wrong?

http://i68.tinypic.com/281d4er.jpg

http://i67.tinypic.com/206dfcw.jpg

### QuantOverflow

#### How to calculate the conditional variance of a time series?

I am reading a paper where the term conditional variance is mentioned, but I am not really sure what is meant by this and how this can be calculated:

Fig. 2 shows the conditional variances of the centered returns of the series of prices under study.

As far is know the term conditional variances is used only in GARCH models. So, I assume that in order to calculate these variances one has to use a GARCH Model for the returns. First, one has to calculate the returns $r_t = \ln(p_t) - \ln(p_{t-1})$. Then, the returns should be centered via $\hat{r}_t = r_t-\bar{r}$ (quite unsure if this meant by centered). The last step would be to apply a GARCH model. Is this going into the right direction or am I completely lost here?

#### Implied volatility as price transform

1. Implied volatility

The way I understand it, traders often think of implied volatility as a transformed price. So in a way, the Black Scholes model is considered a 'model-free' blackbox that takes a market price and returns an 'implied volatility'. A trader might very well say 'I bought AstraZeneca at 20 vol'. Why is that they prefer implied vol as a price?

1. Implied vol surface

When you devise a new stochastic volatility model, you want it to match the empirical volatility surface as closely as possible (thereby matching the price surface as closely as possible because there is this one to one relationship between implied volatility and price). Why do quants prefer the implied vol surface instead of bespoke price surface?

Thanks!

### StackOverflow

#### Can this handle by machine learning in Spark MLlib

I have this scene:

There are some historical datum, I want to labeled them with 0 and one. I can label some of them with 1 exactly, but the others can't be labeled with 0 directly because some of them still should be labeled 1.

I want to know:

• Can this be solved by machine learning?
• If ok, which algorithm(s) in Spark MLlib should I try?

### CompsciOverflow

#### DFA accepts common strings, reduction to NPcomplete

$B=\{\left<M_1,M_2,...,M_k\right>\text{ : Each$M_i$is a DFA and all of the$M_i$accept some common string.} \}$

I'm trying to show that B is NP-complete. I know I have to reduce it to another NP-complete problem, but I'm having a lot of trouble coming up with the algorithm that decides B.

I was thinking I could keep track of all the DFAs accepted by M1 and then check those in M2, any that accept there I'd feed to M3 and so on until I either ran through all the DFAs (accept) or ran out of accepted strings (reject). I'm not really convinced this runs in NP time though. How exactly do I prove that it does? Or is this a terrible algorithm?

Thank you!

#### Class of the language only containing the empty string?

$L = \left \{ \epsilon \right \}$
Clearly this language is finite so this must be a regular language.
Now since every regular language is Context Sensitive, $L$ is a CSL.
We can define the grammar for $L$ as :
$S\rightarrow \epsilon$
Now since $L$ is a CSL, this grammar must be a Context Sensitive Grammar. But from the definition of a Context Sensitive Grammar:

A Context sensitive grammar is any grammar in which the left side of each production is not longer than the right side.

But here
$\left | S \right | > \left | \epsilon \right |$
I am unable to figure out what's wrong here.

### Fred Wilson

#### A Chart To Ponder

I was looking at some charts this morning. This one of the NASDAQ got my attention.

You can see the wind we’ve had at our back since the financial crisis of 2008/2009. Seven years of good financial markets.

My career as a fund manager started in 1996. We had five years of good times followed by three years of bust. Then we had five years of good times followed by two tough ones. Now we’ve had seven years of good times.

I will leave it at that. Sorry to be a bummer this morning.

### StackOverflow

#### What is the mechanism by which functions with multiple parameter lists can (sometimes) be used with less than the required number of parameters?

Let me introduce this question by way of an example. This was taken from Lecture 2.3 of Martin Odersky's Functional Programming course.

I have a function to find fixed points iteratively like so

 object fixed_points {
println("Welcome to Fixed Points")
val tolerance = 0.0001
def isCloseEnough(x: Double, y: Double) =
abs((x-y)/x) < tolerance

def fixedPoint(f: Double => Double)(firstGuess: Double) = {
def iterate(guess: Double): Double = {
println(guess)
val next = f(guess)
if (isCloseEnough(guess, next)) next
else iterate(next)
}
iterate(firstGuess)
}


I can adapt this function to finding square roots like so

  def sqrt(x: Double) =
fixedPoint(y => x/y)(1.0)


However, this does not converge for certain arguments (like 4 for example). So I apply an average damping to it, essentially converting it to Newton-Raphson like so

  def sqrt(x: Double) =
fixedPoint(y => (x/y+y)/2)(1.0)


which converges.

Now average damping is general enough to warrant its own function, so I refactor my code like so

 def averageDamp(f: Double => Double)(x: Double) = (x+f(x))/2


and

  def sqrtDamp(x: Double) =
fixedPoint(averageDamp(y=>x/y))(1.0)              (*)


Whoa! What just happened?? I'm using averageDamp with only one parameter (when it was defined with two) and the compiler does not complain!

Now, I understand that I can use partial application like so

 def a = averageDamp(x=>2*x)_
a(3)  // returns 4.5


No problems there. But when I attempt to use averageDamp with less than the requisite number of parameters (as was done in sqrtDamp) like so

 def a = averageDamp(x=>2*x)                (**)


I get an error missing arguments for method averageDamp.

Questions:

1. How is what I have done in (**) different from (*) that the compiler complains in the former but not the latter?
2. So it looks like using less than the requisite parameters is allowed under certain circumstances. What are these circumstances and what is the name given to this mechanism? (I realize this would come under the topic of 'currying', but I'm after the specific name of this subset of currying, as it were)

#### How to correctly understand behavior of RxJava's groupBy?

I'm pretty new to RxJava and FP in general. I want to write a code to join two Observables. Let's say we have two sets of integers:

• [0..4] with key selector as modulo of 2, giving (key, value) = {(0,0), (1,1), (0,2),...}
• [0..9] with key selector as modulo of 3, giving (key, value) = {(0,0), (1,1), (2,2), (0,3), (1,4),...}

My steps to join them are as follows:

1. Group each set by its keys. The 1st set creates two groups with keys 0 and 1. The 2nd creates three groups with keys 0, 1 and 2.
2. Make a Cartesian product of two sets of groups, giving 6 pairs of groups in total with keys: 0-0, 0-1, 0-2, 1-0, 1-1, 1-2.
3. Filter only those pairs that have same keys on both sides, leaving only 0-0 and 1-1.
4. Within each pair, make a Cartesian product of left and right groups.

Below is the helper class to calculate Cartesian product:

public class Cross<TLeft, TRight, R> implements Observable.Transformer<TLeft, R> {
private Observable<TRight>      _right;
private Func2<TLeft, TRight, R> _resultSelector;

public Cross(Observable<TRight> right, Func2<TLeft, TRight, R> resultSelector) {
_right = right;
_resultSelector = resultSelector;
}

@Override
public Observable<R> call(Observable<TLeft> left) {
return left.flatMap(l -> _right.map(r -> _resultSelector.call(l, r)));
}
}


And here's the code to join:

Observable.range(0, 5).groupBy(i -> i % 2)
.compose(new Cross<>(Observable.range(0, 10).groupBy(i -> i % 3), ImmutablePair::new))
.filter(pair -> pair.left.getKey().equals(pair.right.getKey()))
.flatMap(pair -> pair.left.compose(new Cross<>(pair.right, ImmutablePair::new)))
.subscribe(System.out::println);


However, the output is not correct:

(0,0)
(0,3)
(0,6)
(0,9)
(1,1)
(1,4)
(1,7)


If I remove the line containing filter, there'll be no result at all. The correct output should be just like running this:

Observable.range(0, 5)
.compose(new Cross<>(Observable.range(0, 10), ImmutablePair::new))
.filter(pair -> pair.left % 2 == pair.right % 3)
.subscribe(System.out::println);


which gives:

(0,0)
(0,3)
(0,6)
(0,9)
(1,1)
(1,4)
(1,7)
(2,0)
(2,3)
(2,6)
(2,9)
(3,1)
(3,4)
(3,7)
(4,0)
(4,3)
(4,6)
(4,9)


Could someone explain the behavior? Many thanks.

Note: I use org.apache.commons.lang3.tuple.ImmutablePair in case you wonder.

### QuantOverflow

#### Pricing Forward Start Option with PDE

I am looking for references (books and papers) or suggestions on how to price forward starting calls using a PDE approach typically in the Heston model (In the BS world, the computation is trivial), with forward payoff $$\left(\frac{S_{t+\tau}}{S_t}-K\right)^{+},$$ where $t$ and $\tau$ are positive numbers.

I feel like the only way to use a PDE approach would be to identify the fundamental solution of the PDE in order to be able to apply the tower property on the expectation of the payoff.

All I have read up to know focus computing the characteristic function, and the martingale approach.

### StackOverflow

#### How to remove redundant features using weka

I have around 300 features and I want to find the best subset of features by using feature selection techniques in weka. Can someone please tell me what method to use to remove redundant features in weka :)

### CompsciOverflow

#### Recovering a 3D point from a perspective projection [on hold]

Say I have a point $(P,Q,R)$ and that point lines in a plane with gradient $(A,B,C,)^T$. If I know the perspective projection of that point onto an image plane has the coordinates (x,y) and the focal distance is f, how can I express the coordinates of the original 3D point using the gradient coefficients and the projected point?

#### Algorithm: smallest weight string value from language

We have been asked to find algorithm solution for following problem:

Assume $CFG$ with alphabet $\Sigma$ $\subseteq$ $\mathbb{Z}$. We define string weight as follows: w = $k_1$ ... $k_n$ $\in$ $\Sigma^*$ as $||w||$= $\sum\limits_{i=1}^n k_i$. Weight of $CFG$ over $\Sigma$ is then defined as $||G|| = min \{||w|| : w \in L(G)\}$. Minimum of the set is defined in standard way. Assume that $min(\emptyset) = \infty$ and if $S$ is non-empty set, which does not contain minimal element, assume that $min(S) = -\infty$. Also assume that $min(\{-\infty\}\cup S) = - \infty$ and $min(\{\infty\}\cup S) = min(S)$.

Design an algorithm, that returns $||G||$ for given context-free grammar $G$.

/edit: We tried to create an algorithm that transforms grammar into proper grammar (so we eliminate cycles), then into CNF.

Algorithm then finds direct recursive rules and tries to evaluate context around recursive non-terminal (lets say $A->xAy$) so we find value of x and y and than we know if it is positive we can replace the rule with $A->xby$, where we assume that rule $A->b$ (or similar rule which right side value can be evaluated) exists, or return -inf. Idea gets a bit more complicated when there is indirect recursion like CFG with rules $\{S->AB|-9 ,A->BC, B->-1, C-SB|3\}$. I do not need complete solution, just a small hint. Unfortunately parikh theorem did not worked so far. Thank you for any suggestions.

### StackOverflow

#### Rule of thumb for k value in K nearest neighbor [migrated]

I found that often used rule of thumb for k equals the square root of the number of points in the training data set in kNN.

In my problem I have 300 features of 1000 users and I use 10 fold cross validation.

Can someone tell me what value I should consider to obatin the square root? is it 300 or 1000?

### QuantOverflow

#### How to check that an interest rate curve is arbitrage free

I have 2 interest rate curves (LIBOR 3M and OIS). I want to create stress scenarios for those two curves. Is it possible that some scenarios will make my term structure arbitrageable? How can I test if, after I apply the shocks to the curves, I did break the no-arbitrage condition in the term structure?

### CompsciOverflow

#### A non-mechanical way to get an infinite decidable subset of a Turing-recognizable language?

There's a famous theorem that every infinite Turing-recognizable language has an infinite decidable subset. The standard proof of this result works by constructing an enumerator for the Turing-recognizable language, then including the first enumerated string in the decidable language, then the first string that comes after it lexicographically, then the first string that comes after that lexicographically, etc. Since this set is enumerated by a Turing machine in lexicographically increasing order, it's decidable.

This construction works, but it doesn't seem to give a very "natural" example of an infinite decidable subset. In particular, the only way I can think of to describe the subset is to point at a specific enumerator for the language, define a recurrence relation from it, and then define the language from that recurrence relation.

Is there an alternative construction that produces an infinite decidable set from an infinite Turing-recognizable language that is less dependent on the particulars of how a specific enumerator runs?

#### Question regarding the potential method for amortized analysis [on hold]

What does the following mean exactly for the potential method? Is this applicable to all situations?

If the potential is positive, then we overcharged for some operations. If it is negative, we are undercharged.

#### How could I guess that this problem has an algorithm with linear time complexity [on hold]

I was asked the following problem by a friend who is studying for job interviews.

Consider a list of $N$ non-negative integers, $l_k \in \mathbb{Z}^+$, construct an algorithm that finds the smallest integer $a$ such that $a \ne \sum \limits_i l_i$, where each $i$ is unique in the sum (find the smallest integer that can not be written as a partial sum of the list elements). Additionally, I was told you can find an $\mathcal{O}(N)$ time complexity solution.

How could I guess that there should be a linear time complexity solution to this problem? That information drastically changes algorithms I might play with in my head.

He told me a friend of his who works for one of these tech companies new immediately that there should be a linear time solution. Is there a mathematical reason to assume this, or is it just because the types of problems they ask in interviews normally can be solved in linear time?

Thanks!

#### Shortest integer path [on hold]

An integer path in $\mathbb{Z}$ is a finite sequence of integers $a_1 \to a_2 \to a_3 \to \ldots \to a_n$. The length of an integer path is the sum of the absolute differences between its adjacent elements, $|a_1 - a_2| + |a_2 - a_3| + \ldots + |a_{n-1} - a_n|$.

Integers between two elements in the sequence are implied to be in the path. The integer $t$ is contained in the path as long as there are adjacent elements $a_k$ and $a_k+1$ where $a_k \le t \le a_k+1$ or $a_k ≥ t ≥ a_k+1$. Note that this means paths with redundant intermediate elements are equivalent to paths that omit those elements. If $x \le y \le z$ and we have a path containing $\ldots \to x \to y \to z \to \ldots$ or $\ldots \to z \to y \to x \to \ldots$, then you can drop the $y$ without changing the implied path.

Let $i_k = min \{i| a_i \le k \le a_{i + 1}\}$ for $k$ in path $a_1 \to \ldots \to a_n$.

Then length of integer path segment between $k_1$ and $k_2$ $\rho (k_1, k_2)$ is length of integer path $$k_1 \to a_{i_{k_1} + 1} \ldots \to a_{i_{k_2}} \to k_2$$

If particular, length of integer path segment $\rho(x,z) \le \rho(x,y) + \rho(y,z)$ and $\rho(x,y) \ge |x - y|$.

Let $x_1, l_1, \ldots, x_n, l_n$ be positive integer numbers, $x_1 < \ldots < x_n$.

We need to find shortest integer path in $\mathbb{Z}$ with begin in any positive integer $x$ and propeties: $x_1, \ldots, x_n$ are in this path and length of path segment (no line segment!) between $x$ and $x_i$ not more than $l_i$.

I want some good algorithm to find length of this shortest path or error if this shortest path not exists.

Desired asymptotics for me: if $max_i (x_i) < n^2$ than solution time $O(n^2)$.

First example

$(x_1, \ldots, x_5) = (1, 3, 5, 8, 10)$, $(l_1, \ldots, l_5) = (3, 1, 8, 19, 15)$.

Shortest path $$x = x_2 = 3(length = 0 < 1) \to 1(length = 2 < 3) \to 5(length = 6 < 8) \to 8(length = 9 < 19) \to 10(length = 11 < 15)$$

Shortest path length = $11$.

Second example

$(x_1, x_2, x_3, x_4, x_5) = (1, 2, 3, 4, 5)$, $(l_1, l_2, l_3, l_4, l_5) = (5, 1, 2, 4, 3)$.

No solution, because begin path $x = x_2 = 2 \to x_5 = 5$ (if not than $length(x,x_2) > l_2$ or $length(x,x_5) > l_5$), but $|x_5 - x_1| > l_1 - length(x,x_5)$.

My ideas

First, we can use algorithm:

1) Sort $l = (l_1,\ldots, l_n)$

2) Let $sort(l) = (l_{i_1}, \ldots, l_{i_n})$.

Than our path $x = x_{i_1} \to \ldots \to x_{i_n}$.

Counterexample:

$(x_1, x_2, x_3) = (1, 2, 3)$, $(l_1, l_2, l_3) = (2, 1, 3)$

In algorithm, we have length of our path is $3$, but $x_1 \to x_2 \to x_3$ is fine too and his length is $2$.

Algorithms like brute force or enumeration of all correct paths have bad asymptotics.

I havent any good idea for this question.

Thank you for any help!

#### Can the isomorphic graph problem be solved in deterministic polynomial time?

Here is a recent homework problem of mine:

Call graphs G and H isomorphic if the nodes of G may be reordered so that it is identical to H. Let ISO = {⟨G,H⟩| G and H are isomorphic graphs}. Show that ISO ∈ NP.

In order to solve this problem, I created a non-deterministic Turing machine that did the following:

1. Check if each graph has the same number of nodes
2. Non-deterministically arrange the edges of G in a permutation that matches H
3. Verify that each edge in the permutation of G matches the corresponding edge in H

As far as I know, this approach is acceptable for solving this problem with non-determinism.

My question is this: if we can compare the edges in the permutation of G against H, why can't we simply check to see if each edge in the original list of edges in G (a maximum of $n^2$ nodes) has a matching edge in H (which also has a maximum of $n^2$ nodes) for a total big O of O($n^4$)? If this is possible, does that mean that this problem is in P?

### StackOverflow

#### Azure ML Recommendations

I want to use Azure ML to find related products using information from receipts from a store.

I got a file of reciepts:

44366,136778
79619,88975
78861,78864
53395,78129,78786,79295,79353,79406,79408,79417,85829,136712
32340,33973
31897,32905
32476,32697,33202,33344,33879,34237,34422,48175,55486,55490,55498
17800
32476,32697,33202,33344,33879,34237,34422,48175,55490,55497,55498,55503
47098
136974
85832


Each row represent one receipt and each number is a product id.

Given a product id I want to get a list of similar products, i.e. products that was bought together by other customers.

Can anyone point me in the right direction of how do to this?

### UnixOverflow

#### Number of cores on FreeBSD

What would the equivalent of Debian's nproc be in FreeBSD? I'm trying to include this in a bash script variable so if it could just print out the number of cores that would be fantastic.

### StackOverflow

#### Why does Spark ML NaiveBayes output labels that are different from the training data?

I use the NaiveBayes classifier in Apache Spark ML (version 1.5.1) to predict some text categories. However, the classifier outputs labels that are different from the labels in my training set. Am I doing it wrong?

Here is a small example that can be pasted into e.g. Zeppelin notebook:

import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.NaiveBayes
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.sql.Row

// Prepare training documents from a list of (id, text, label) tuples.
val training = sqlContext.createDataFrame(Seq(
(0L, "X totally sucks :-(", 100.0),
(1L, "Today was kind of meh", 200.0),
(2L, "I'm so happy :-)", 300.0)
)).toDF("id", "text", "label")

// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
val tokenizer = new Tokenizer()
.setInputCol("text")
.setOutputCol("words")
val hashingTF = new HashingTF()
.setNumFeatures(1000)
.setInputCol(tokenizer.getOutputCol)
.setOutputCol("features")
val nb = new NaiveBayes()

val pipeline = new Pipeline()
.setStages(Array(tokenizer, hashingTF, nb))

// Fit the pipeline to training documents.
val model = pipeline.fit(training)

// Prepare test documents, which are unlabeled (id, text) tuples.
val test = sqlContext.createDataFrame(Seq(
(4L, "roller coasters are fun :-)"),
(5L, "i burned my bacon :-("),
(6L, "the movie is kind of meh")
)).toDF("id", "text")

// Make predictions on test documents.
model.transform(test)
.select("id", "text", "prediction")
.collect()
.foreach { case Row(id: Long, text: String, prediction: Double) =>
println(s"($id,$text) --> prediction=$prediction") }  The output from the small program: (4, roller coasters are fun :-)) --> prediction=2.0 (5, i burned my bacon :-() --> prediction=0.0 (6, the movie is kind of meh) --> prediction=1.0  The set of predicted labels {0.0, 1.0, 2.0} are disjoint from my training set labels {100.0, 200.0, 300.0}. Question: How can I map these predicted labels back to my original training set labels? Bonus question: why do the training set labels have to be doubles, when any other type would work just as well as a label? Seems unnecessary. #### C5.0 returns tree of size 1 I'm working with LendingClub's publicly available data on approved loans from 2012-2015: https://www.lendingclub.com/info/download-data.action What it looks like library(C50) library(dplyr) library(ggplot2) #Preprocessed in terminal with 'tail -n +2 LoanStats3c.csv > approved.csv' setwd("~/Dropbox/dataprojects/Lending Club/") approval <- read.csv("approved.csv") approved_2014 <- read.csv("approved_2014.csv") approved_2013 <- read.csv("approved_2013.csv") #Create a big dataframe approved <- rbind(approval, approved_2014, approved_2013) #Filter it approved <- filter(approved, application_type == "INDIVIDUAL", purpose=="debt_consolidation") keep <- c("loan_status","addr_state", "annual_inc", "delinq_2yrs", "dti", "grade","sub_grade", "home_ownership", "emp_length", "loan_amnt", "installment", "int_rate", "open_acc", "zip_code", "inq_last_6mths", "verification_status", "pub_rec", "term", "revol_bal", "revol_util") approved <- approved[keep] #Scramble it set.seed(12345) approved <- approved[order(runif(429949)), ] #Drop Unused Levels -- necessary for C5.0 approved <- droplevels(approved) num_examples = dim(approved)[1] #Prepares class variables. approved$loan_status <- as.character(approved$loan_status) approved$loan_status[approved$loan_status=="Charged Off"] <- "0" approved$loan_status[approved$loan_status=="Default"] <- "0" approved$loan_status[approved$loan_status=="Late (31-120 days)"] <- "0" approved$loan_status[approved$loan_status=="Current"] <- "1" approved$loan_status[approved$loan_status=="Fully Paid"] <- "1" approved$loan_status[approved$loan_status=="Issued"] <- "1" approved$loan_status[approved$loan_status=="In Grace Period"] <- "1" approved$loan_status[approved$loan_status=="Late (16-30 days)"] <- "1" approved$loan_status[approved$loan_status==""] <- "missing" approved$loan_status <- as.factor(approved$loan_status) summary(approved$loan_status)

#Change these to numerics so they can be normalized
approved$int_rate <- as.numeric(sub("%", "", approved$int_rate))
approved$revol_util <- as.numeric(sub("%", "", approved$revol_util))
approved$term <- as.numeric(sub("months", "", approved$term))

#Normalize Numeric Columns
ind <- sapply(approved, is.numeric)
approved[ind] <- lapply(approved[ind], scale)
levels(approved)[levels(approved) == ""] <- "missing"

#Create Train/Test Split
split_pt2 = ceiling(num_examples*0.7)
split_pt1 = floor(num_examples*0.7)
approved_train <- approved[1:split_pt1,]
approved_test <- approved[split_pt2:num_examples,]

train_class <- approved_train$loan_status test_class <- approved_test$loan_status
#Remove the class from the training/testing data
approved_train <- approved_train[,2:20]
approved_test <- approved_test[,2:20]

m <- C5.0(approved_train, train_class, trials=10)
summary(m)
p <- predict(m, approved_test)
summary(p)


I'm new to R & C5.0, so I might be overlooking something very obvious. Not sure whats wrong here. Thanks for any insight you can provide.

Call:
C5.0.default(x = approved_train, y = train_class, trials = 10)

C5.0 [Release 2.07 GPL Edition]     Tue Nov 24 16:38:41 2015
-------------------------------

Class specified by attribute outcome'

Read 128985 cases (20 attributes) from undefined.data

-----  Trial 0:  -----

Decision tree:
1 (128985/8250)

-----  Trial 1:  -----

Decision tree:
1 (128985/36371.2)

*** boosting reduced to 1 trial since last classifier is very inaccurate

*** boosting abandoned (too few classifiers)

Evaluation on training data (128985 cases):

Decision Tree
----------------
Size      Errors

1 8250( 6.4%)   <<

(a)    (b)    <-classified as
-----  -----
8250    (a): class 0
120735    (b): class 1


#### TensorFlow installation on Ubuntu

When I try to install TensorFlow Machine Learning library on Ubunto (vmware image) using command :

$pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl  after downloading the package I got this error : Traceback (most recent call last): File "", line 14, in IOError: [Errno 2] No such file or directory: '/tmp/pip-GgS7fR-build/setup.py' Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 14, in IOError: [Errno 2] No such file or directory: '/tmp/pip-GgS7fR-build/setup.py' I am using pip, python 2.7 and Ubuntu 12.04 LTS vmware image Can anyone please help me to solve this error? full pip.log file error : ------------------------------------------------------------ /usr/bin/pip run on Sun Nov 22 06:06:30 2015 Downloading/unpacking https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl Downloading from URL https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl Running setup.py egg_info for package from https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl Traceback (most recent call last): File "<string>", line 14, in <module> IOError: [Errno 2] No such file or directory: '/tmp/pip-GgS7fR-build/setup.py' Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 14, in <module> IOError: [Errno 2] No such file or directory: '/tmp/pip-GgS7fR-build/setup.py' ---------------------------------------- Command python setup.py egg_info failed with error code 1 Exception information: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 126, in main self.run(options, args) File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 223, in run requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle) File "/usr/lib/python2.7/dist-packages/pip/req.py", line 980, in prepare_files req_to_install.run_egg_info() File "/usr/lib/python2.7/dist-packages/pip/req.py", line 216, in run_egg_info command_desc='python setup.py egg_info') File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 255, in call_subprocess % (command_desc, proc.returncode)) InstallationError: Command python setup.py egg_info failed with error code 1  #### TensorFlow installation on Ubuntu 14.04 LTS When I try to install TensorFlow Google's Machine Learning library on Ubunto using command : $ pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl


i keep getting this error

tensorflow-0.5.0-cp27-none-linux_x86_64.whl is not a supported wheel on this platform.
Storing debug log for failure in /home/user/.pip/pip.log


I am using pip installed on my machine and have python 2.7 installed on the machine as well

vmware image info :

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04 LTS
Release: 14.04
Codename: trusty

#### How to get randomForest model output in probability using Caret?

I am trying to use Caret for building random forest model for binary classification. I have used randomForest source package to do this in past and it worked fine but using Caret my output is binary rather then probability. I am using the same syntax (I hope) for both. This is what I used to get with source randomForest package.

>fit = randomForest(x = a[,-1], y = as.factor(a[,1]),ntree=120)
>head(predict(fit, newdata = test_data[,-c(1:2)], type = "prob")[,2])
1          2          3          4          5          6
0.04166667 0.03333333 0.55833333 0.80000000 0.87500000 0.04166667


Now, using Caret I am trying to do the same but its not accepting " type='prob' " in predict function, giving me the error

>rf_model<-train(x = a[,-1], y = as.factor(a[,1]),method="rf",ntree=120)
Error in [.data.frame(out, , obsLevels, drop = FALSE) :
undefined columns selected


Rather when I take out the "type", it gives me

>head(predict(rf_model, test_data[,-c(1:2)]))
[1] 0 0 1 1 1 0
Levels: 0 1


How do I get output in probabilities?

I need to create multiple algorithms after this and I think Caret would be more homogeneous to do that. I am sure I am missing something here but being new to Caret I don't know what.

### QuantOverflow

#### VIX options historical data

I'm looking at these data: call and put options on VIX. I'm interested in daily quotes (all strikes and maturities) for - at least - 2009/10. Could you list link of possible sources? and possibily esperience-based download instructons?. Most sources I've already checked let you download VIX options data only in addition to S&P500 options data, which I do not need. Thanks in Advance. Any suggestion is more than welcome.

### StackOverflow

#### Can I directly H2O library functions from Java or the only option for H2O is R?

I want to use machine learning algorithms in java. Mahout with hadoop is too slow and weka is not able to work because of large datasize. So is it possible to call H2O library from Java or any other better option available for java?

### StackOverflow

#### Ideal k value in kNN for classification

I am doing a classification (not clustering). Can I use kNN algorithm for this? What is the ideal k value to test? In some stackflow answers I saw that they have adviced to use the square root of the number of features. But from where this rules comes from? Can someone please help me :)

#### Caffe training without testing

I am using Caffe to train AlexNet on a known image database. I am benchmarking and want to exclude a testing phase.

Here is the solver.prototxt for AlexNet:

net: "models/bvlc_alexnet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_alexnet/caffe_alexnet_train"
solver_mode: GPU


While I have never found a definitive doc that detailed all of the prototxt options, comments within Caffe tutorials indicate this "test_interval" represents the number of iterations after which we test the trained network.

I figured that I might set it to zero to turn off testing. Nope.

F1124 14:42:54.691428 18772 solver.cpp:140] Check failed: param_.test_interval() > 0 (0 vs. 0)
*** Check failure stack trace: ***


So I set the test_interval to one million, but still of course, Caffe tests the network at iteration zero.

I1124 14:59:12.787899 18905 solver.cpp:340] Iteration 0, Testing net (#0)
I1124 14:59:15.698724 18905 solver.cpp:408]     Test net output #0: accuracy = 0.003


How do I turn testing off while training?

### UnixOverflow

#### What is the proper way to create a VirtualBox image using NanoBSD?

Based on instructions at https://www.freebsd.org/doc/en/articles/nanobsd/howto.html, I am trying to construct a VirtualBox format hard drive file with an image generated by NanoBSD. When the VM is started, it ends up with a black screen and frozen cursor at top left after the VirtualBox BIOS screen.

Steps I have taken:

• from inside a FreeBSD x86_64 VM
• run the nanobsd script with defaults (generic kernel and no custom config file)
• from host machine
• copy /usr/obj/nanobsd.full/._disk.full to host
• VBoxManage convertfromraw --format VDI _.disk.full nanobsd.vdi
• attach the vdi file to IDE on an empty FreeBSD x86_64 vm
• start the vm

Can someone advise on errors or missing steps?

### StackOverflow

#### How to calculate information gain for below data set?

While understanding information gain calculation - The probability of cancer in a population is 1%. A test for cancer correctly identifies cancer patients with a probability of 50% and non cancer patients with a probability of 99.5%. Now I have to calculate information gain obtained using this cancer test? This is one of the exercise question I am trying to solve while learning entropy and information gain. edit - My attempt to calculate above is -

If we consider total population as 100 -
Cancer patient =1 Non-cancer patient = 99 Entropy H = -1/100 log(1/100)- 99/100 log(99/100)

Now the test on cancer patient gives me - 50% cancer patient and 50% non cancer patient. Hence entropy of classification as cancer patient -

H1 = -1/2(log1/2)-1/2log(1/2)


Non-cancer patients it gives 99.5% non-cancer patients and .5% cancer .Hence information gain should be. Entropy of classification to non-cancer patient is -

H2 = -(99.5*99/100)log(99.5*99/100) - (5/100)*99 log(5/100*99) I want to know is it correct way to get entropy after test. If this is right the information gain can be calculated -

Information gain = H - (H1+H2)


#### Feature mapping using multi-variable polynomial

Consider we have a data-matrix of $N$ data points $\mathbf{x} = [x_1, x_2]^T$ and we are interested to map those data points into a higher dimensional feature space. We can do this by using d-degree polynomials. Thus for a sequence of $N$ data points the new data-matrix is

$\begin{bmatrix} x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] \vdots & \vdots & & & & & & & & \vdots \\[0.3em] x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] \end{bmatrix}\in \mathbf{R}^{N \times 2^{d} + 1 }$

I have studied a relevant script (Andrew Ng. online course) that make such a transform for 2-dimensional data points to a higher feature space. However, I could not figure out a way to generalize in arbitrary higher dimensional samples, $\mathbf{x} = [x_1, x_2, \dots, x_D]$. Here is the code:

d = 6;
m = size(D,1);
new = ones(m);
for k = 1:d
for l = 0:k
new(:, end+1) = (x1.^(k-l)).*(x2.^l);
end
end


Can we vectorize this code? Also given a data-matrix $\mathbf{D} \in \mathbf{R}^{N \times 2}$ could you please suggest a way on how we can transform data points of arbitrary dimension to a higher one using a d-dimensional polynomial?

PS: A generalization of d-dimensional data points would be very helpful.

#### Riak on top of LevelDB

I am planning to run Riak on top of LevelDB. I have downloaded both the packages for Riak and LevelDB separately. I am not sure how to link Riak with LevelDB - I didnt find any installation documents to run Riak on LevelDB. I am not sure if I have to do these two installations separately or there is a way where I can work install one package which has Riak and customized leveldb for the same. I am new to this and learning, so any suggestions will really help here.

Also, if I am heading into the right direction by installing Riak and LevelDB separately - how should I link them?

### Planet Theory

#### TR15-188 | Super-Linear Gate and Super-Quadratic Wire Lower Bounds for Depth-Two and Depth-Three Threshold Circuits | Ryan Williams, Daniel Kane

In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. $\bullet$ We prove that for all $\epsilon\gg \sqrt{\log(n)/n}$, the linear-time computable Andreev's function cannot be computed on a $(1/2+\epsilon)$-fraction of $n$-bit inputs by depth-two linear threshold circuits of $o(\epsilon^3 n^{3/2}/\log^3 n)$ gates, nor can it be computed with $o(\epsilon^{3} n^{5/2}/\log^{7/2} n)$ wires. This establishes an average-case size hierarchy'' for threshold circuits, as Andreev's function is computable by uniform depth-two circuits of $o(n^3)$ linear threshold gates, and by uniform depth-three circuits of $O(n)$ majority gates. $\bullet$ We present a new function in $P$ based on small-biased sets, which we prove cannot be computed by a majority vote of depth-two linear threshold circuits with $o(n^{3/2}/\log^3 n)$ gates, nor with $o(n^{5/2}/\log^{7/2}n)$ wires. $\bullet$ We give tight average-case (gate and wire) complexity results for computing PARITY with depth-two threshold circuits; the answer turns out to be the same as for depth-two majority circuits. The key is a new random restriction lemma for linear threshold functions. Our main analytical tool is the Littlewood-Offord Lemma from additive combinatorics.

### QuantOverflow

#### performance of historical VaR parameters

An historical VaR measure is parameterized in terms of the confidence level and also number of periods. Specifically, the $\alpha$% T-period VaR is defined as the portfolio loss x in market value over time T that is not expected to be exceeded with probability (1 - $\alpha$).

I am looking for empirical backtesting research on the choice of T-period and $\alpha$% for producing stock portfolios. Usually this backtest research involves looking at the # of "exceptions" (violations of the predicted risk), convergence tests, the Kupiec, and Kupier test, or involves looking at the realized risk of a portfolio constructed to minimize the VaR measure.

An illustrative example of this research is here -- however, this study involves the Greek equity market and the sample consists of only 5 equities and my focus market is U.S. equity portfolios consisting of say 25+ securities for 3-month to 1-year holding periods. Another VaR study covering the Forex market is here.

### Planet Theory

#### An approximation algorithm for Uniform Capacitated k-Median problem with 1 + {\epsilon} capacity violation

Authors: Jarosław Byrka, Bartosz Rybicki, Sumedha Uniyal
Abstract: We study the Capacitated k-Median problem, for which all the known constant factor approximation algorithms violate either the number of facilities or the capacities. While the standard LP-relaxation can only be used for algorithms violating one of the two by a factor of at least two, Shi Li [SODA'15, SODA'16] gave algorithms violating the number of facilities by a factor of 1+{\epsilon} exploring properties of extended relaxations.

In this paper we develop a constant factor approximation algorithm for Uniform Capacitated k-Median violating only the capacities by a factor of 1+{\epsilon}. The algorithm is based on a configuration LP. Unlike in the algorithms violating the number of facilities, we cannot simply open extra few facilities at selected locations. Instead, our algorithm decides about the facility openings in a carefully designed dependent rounding process.

#### Cost Minimizing Online Algorithms for Energy Storage Management with Worst-case Guarantee

Authors: Chi-Kin Chau, Guanglin Zhang, Minghua Chen
Abstract: The fluctuations of electricity prices in demand response schemes and intermittency of renewable energy supplies necessitate the adoption of energy storage in microgrids. However, it is challenging to design effective real-time energy storage management strategies that can deliver assured optimality, without being hampered by the uncertainty of volatile electricity prices and renewable energy supplies. This paper presents a simple effective online algorithm for the charging and discharging decisions of energy storage that minimizes the electricity cost in the presence of electricity price fluctuations and renewable energy supplies, without relying on the future information of prices, demands or renewable energy supplies. The proposed algorithm is supported by a near-best worst-case guarantee (i.e., competitive ratio), as compared to the offline optimal decisions based on full future information. Furthermore, the algorithm can be adapted to take advantage of limited future information, if available. By simulations on real-world data, it is observed that the proposed algorithms can achieve satisfactory outcome in practice.

#### Lift-and-Round to Improve Weighted Completion Time on Unrelated Machines

Authors: Nikhil Bansal, Ola Svensson, Aravind Srinivasan
Abstract: We consider the problem of scheduling jobs on unrelated machines so as to minimize the sum of weighted completion times. Our main result is a $(3/2-c)$-approximation algorithm for some fixed $c>0$, improving upon the long-standing bound of 3/2 (independently due to Skutella, Journal of the ACM, 2001, and Sethuraman & Squillante, SODA, 1999). To do this, we first introduce a new lift-and-project based SDP relaxation for the problem. This is necessary as the previous convex programming relaxations have an integrality gap of $3/2$. Second, we give a new general bipartite-rounding procedure that produces an assignment with certain strong negative correlation properties.

#### Calculating the Unrooted Subtree Prune-and-Regraft Distance

Authors: Chris Whidden, Frederick A. Matsen IV
Abstract: The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7.

In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First we identify four properties of minimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. We then prove the 2008 conjecture of Hickey et al. that chain reduction preserves the unrooted SPR distance. This reduces the problem to a linear size problem kernel, substantially improving on the previous best quadratic size kernel. Then we introduce a new lower bound on the unrooted SPR distance called the replug distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a "progressive A*" search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.

#### Multivariate Complexity Analysis of Geometric {\sc Red Blue Set Cover}

Authors: Pradeesha Ashok, Sudeshna Kolay, Saket Saurabh
Abstract: We investigate the parameterized complexity of \RBSC\ (\srbsc), a generalization of the classic \SC\ problem and the more recently studied \oRBSC\ problem. Given a universe $U$ containing $b$ blue elements and $r$ red elements, positive integers $k_\ell$ and $k_r$, and a family $\F$ of $\ell$ sets over $U$, the \srbsc\ problem is to decide whether there is a subfamily $\F'\subseteq \F$ of size at most $k_\ell$ that covers all blue elements, but at most $k_r$ of the red elements. This generalizes \SC\ and thus in full generality it is intractable in the parameterized setting. In this paper, we study a geometric version of this problem, called \slrbsc, where the elements are points in the plane and sets are defined by lines. We study this problem for an array of parameters, namely, $k_\ell, k_r, r, b$, and $\ell$, and all possible combinations of them. For all these cases, we either prove that the problem is W-hard or show that the problem is fixed parameter tractable (\FPT). In particular, on the algorithmic side, our study shows that a combination of $k_\ell$ and $k_r$ gives rise to a nontrivial algorithm for \slrbsc. On the hardness side, we show that the problem is para-\NP-hard when parameterized by $k_r$, and \W[1]-hard when parameterized by $k_\ell$. Finally, for the combination of parameters for which \slrbsc\ admits \FPT\ algorithms, we ask for the existence of polynomial kernels. We are able to provide a complete kernelization dichotomy by either showing that the problem admits a polynomial kernel or that it does not contain a polynomial kernel unless $\CoNP \subseteq \NP/\mbox{poly}$.

#### A Note on Fault Tolerant Reachability for Directed Graphs

Authors: Loukas Georgiadis, Robert E. Tarjan
Abstract: In this note we describe an application of low-high orders in fault-tolerant network design. Baswana et al. [DISC 2015] study the following reachability problem. We are given a flow graph $G = (V, A)$ with start vertex $s$, and a spanning tree $T =(V, A_T)$ rooted at $s$. We call a set of arcs $A'$ valid if the subgraph $G' = (V, A_T \cup A')$ of $G$ has the same dominators as $G$. The goal is to find a valid set of minimum size. Baswana et al. gave an $O(m \log{n})$-time algorithm to compute a minimum-size valid set in $O(m \log{n})$ time, where $n = |V|$ and $m = |A|$. Here we provide a simple $O(m)$-time algorithm that uses the dominator tree $D$ of $G$ and a low-high order of it.

#### Decoding Reed-Muller codes over product sets

Authors: John Kim, Swastik Kopparty
Abstract: We give a polynomial time algorithm to decode multivariate polynomial codes of degree $d$ up to half their minimum distance, when the evaluation points are an arbitrary product set $S^m$, for every $d < |S|$. Previously known algorithms can achieve this only if the set $S$ has some very special algebraic structure, or if the degree $d$ is significantly smaller than $|S|$. We also give a near-linear time randomized algorithm, which is based on tools from list-decoding, to decode these codes from nearly half their minimum distance, provided $d < (1-\epsilon)|S|$ for constant $\epsilon > 0$.

Our result gives an $m$-dimensional generalization of the well known decoding algorithms for Reed-Solomon codes, and can be viewed as giving an algorithmic version of the Schwartz-Zippel lemma.

#### Parity Separation: A Scientifically Proven Method for Permanent Weight Loss

Abstract: Given an edge-weighted graph G, let PerfMatch(G) denote the weighted sum over all perfect matchings M in G, weighting each matching M by the product of weights of edges in M. If G is unweighted, this plainly counts the perfect matchings of G.

In this paper, we introduce parity separation, a new method for reducing PerfMatch to unweighted instances: For graphs G with edge-weights -1 and 1, we construct two unweighted graphs G1 and G2 such that PerfMatch(G) = PerfMatch(G1) - PerfMatch(G2). This yields a novel weight removal technique for counting perfect matchings, in addition to those known from classical #P-hardness proofs. We derive the following applications:

1. An alternative #P-completeness proof for counting unweighted perfect matchings.

2. C=P-completeness for deciding whether two given unweighted graphs have the same number of perfect matchings. To the best of our knowledge, this is the first C=P-completeness result for the "equality-testing version" of any natural counting problem that is not already #P-hard under parsimonious reductions.

3. An alternative tight lower bound for counting unweighted perfect matchings under the counting exponential-time hypothesis #ETH.

Our technique is based upon matchgates and the Holant framework. To make our #P-hardness proof self-contained, we also apply matchgates for an alternative #P-hardness proof of PerfMatch on graphs with edge-weights -1 and 1.

#### Super-Linear Gate and Super-Quadratic Wire Lower Bounds for Depth-Two and Depth-Three Threshold Circuits

Authors: Daniel M. Kane, Ryan Williams
Abstract: In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function.

$\bullet$ We prove that for all $\epsilon\gg \sqrt{\log(n)/n}$, the linear-time computable Andreev's function cannot be computed on a $(1/2+\epsilon)$-fraction of $n$-bit inputs by depth-two linear threshold circuits of $o(\epsilon^3 n^{3/2}/\log^3 n)$ gates, nor can it be computed with $o(\epsilon^{3} n^{5/2}/\log^{7/2} n)$ wires. This establishes an average-case size hierarchy'' for threshold circuits, as Andreev's function is computable by uniform depth-two circuits of $o(n^3)$ linear threshold gates, and by uniform depth-three circuits of $O(n)$ majority gates.

$\bullet$ We present a new function in $P$ based on small-biased sets, which we prove cannot be computed by a majority vote of depth-two linear threshold circuits with $o(n^{3/2}/\log^3 n)$ gates, nor with $o(n^{5/2}/\log^{7/2}n)$ wires.

$\bullet$ We give tight average-case (gate and wire) complexity results for computing PARITY with depth-two threshold circuits; the answer turns out to be the same as for depth-two majority circuits.

The key is a new random restriction lemma for linear threshold functions. Our main analytical tool is the Littlewood-Offord Lemma from additive combinatorics.

#### A communication game related to the sensitivity conjecture

Authors: Justin Gilmer, Michal Koucký, Michael Saks
Abstract: One of the major outstanding foundational problems about boolean functions is the sensitivity conjecture, which (in one of its many forms) asserts that the degree of a boolean function (i.e. the minimum degree of a real polynomial that interpolates the function) is bounded above by some fixed power of its sensitivity (which is the maximum vertex degree of the graph defined on the inputs where two inputs are adjacent if they differ in exactly one coordinate and their function values are different). We propose an attack on the sensitivity conjecture in terms of a novel two-player communication game. A lower bound of the form $n^{\Omega(1)}$ on the cost of this game would imply the sensitivity conjecture.

To investigate the problem of bounding the cost of the game, three natural (stronger) variants of the question are considered. For two of these variants, protocols are presented that show that the hoped for lower bound does not hold. These protocols satisfy a certain monotonicity property, and (in contrast to the situation for the two variants) we show that the cost of any monotone protocol satisfies a strong lower bound.

There is an easy upper bound of $\sqrt{n}$ on the cost of the game. We also improve slightly on this upper bound.

#### Lower bounds for constant query affine-invariant LCCs and LTCs

Authors: Arnab Bhattacharyya, Sivakanth Gopi
Abstract: Affine-invariant codes are codes whose coordinates form a vector space over a finite field and which are invariant under affine transformations of the coordinate space. They form a natural, well-studied class of codes; they include popular codes such as Reed-Muller and Reed-Solomon. A particularly appealing feature of affine-invariant codes is that they seem well-suited to admit local correctors and testers.

In this work, we give lower bounds on the length of locally correctable and locally testable affine-invariant codes with constant query complexity. We show that if a code $\mathcal{C} \subset \Sigma^{\mathbb{K}^n}$ is an $r$-query locally correctable code (LCC), where $\mathbb{K}$ is a finite field and $\Sigma$ is a finite alphabet, then the number of codewords in $\mathcal{C}$ is at most $\exp(O_{\mathbb{K}, r, |\Sigma|}(n^{r-1}))$. Also, we show that if $\mathcal{C} \subset \Sigma^{\mathbb{K}^n}$ is an $r$-query locally testable code (LTC), then the number of codewords in $\mathcal{C}$ is at most $\exp(O_{\mathbb{K}, r, |\Sigma|}(n^{r-2}))$. The dependence on $n$ in these bounds is tight for constant-query LCCs/LTCs, since Guo, Kopparty and Sudan (ITCS 13) construct affine-invariant codes via lifting that have the same asymptotic tradeoffs. Note that our result holds for non-linear codes, whereas previously, Ben-Sasson and Sudan (RANDOM 11) assumed linearity to derive similar results.

Our analysis uses higher-order Fourier analysis. In particular, we show that the codewords corresponding to an affine-invariant LCC/LTC must be far from each other with respect to Gowers norm of an appropriate order. This then allows us to bound the number of codewords, using known decomposition theorems which approximate any bounded function in terms of a finite number of low-degree non-classical polynomials, upto a small error in the Gowers norm.

#### Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Authors: Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff
Abstract: We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $\theta$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $\theta$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least $\Omega(\min\{n,d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. We also give the first optimal simultaneous protocol in the dense case for mean estimation.

As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

#### Structural Resolution: a Framework for Coinductive Proof Search and Proof Construction in Horn Clause Logic. (arXiv:1511.07865v1 [cs.LO])

Logic programming (LP) is a programming language based on first-order Horn clause logic that uses SLD-resolution as a semi-decision procedure. Finite SLD-computations are inductively sound and complete with respect to least Herbrand models of logic programs. Dually, the corecursive approach to SLD-resolution views infinite SLD-computations as successively approximating infinite terms contained in programs' greatest complete Herbrand models. State-of-the-art algorithms implementing corecursion in LP are based on loop detection. However, such algorithms support inference of logical entailment only for rational terms, and they do not account for the important property of productivity in infinite SLD-computations. Loop detection thus lags behind coinductive methods in interactive theorem proving (ITP) and term-rewriting systems (TRS).

Structural resolution is a newly proposed alternative to SLD-resolution that makes it possible to define and semi-decide a notion of productivity appropriate to LP. In this paper we show that productivity supports the development of a new coinductive proof principle for LP that semi-decides logical entailment by observing finite fragments of resolution computations for productive programs. This severs the dependence of coinductive proof on term rationality, and puts coinductive methods in LP on par with productivity-based observational approaches to coinduction in ITP and TRS. We prove soundness of structural resolution relative to Herbrand model semantics for productive inductive, coinductive, and mixed inductive-coinductive logic programs.

#### The Shortest Connection Game. (arXiv:1511.07847v1 [cs.GT])

We introduce Shortest Connection Game, a two-player game played on a directed graph with edge costs. Given two designated vertices in which they start, the players take turns in choosing edges emanating from the vertex they are currently located at. In this way, each of the players forms a path that origins from its respective starting vertex. The game ends as soon as the two paths meet, i.e., a connection between the players is established. Each player has to carry the cost of its chosen edges and thus aims at minimizing its own total cost.

In this work we analyze the computational complexity of Shortest Connection Game. On the negative side, the game turns out to be computationally hard even on restricted graph classes such as bipartite, acyclic and cactus graphs. On the positive side, we can give a polynomial time algorithm for cactus graphs when the game is restricted to simple paths.

#### Incremental Query Processing on Big Data Streams. (arXiv:1511.07846v1 [cs.DB])

This paper addresses online processing for large-scale, incremental computations on a distributed stream processing engine (DSPE). Our goal is to convert any distributed batch query to an incremental DSPE program automatically. In contrast to other approaches, we derive incremental programs that return accurate results, not approximate answers, by retaining a minimal state during the query evaluation lifetime and by using incremental evaluation techniques to return an accurate snapshot answer at each time interval that depends on the current state and the latest batches of data. Our methods can handle many forms of queries, including iterative and nested queries, group-by with aggregation, and joins on one-to-many relationships. Finally, we report on a prototype implementation of our framework using MRQL running on top of Spark and we experimentally validate the effectiveness of our methods.

#### Box-Cox transformation of firm size data in statistical analysis. (arXiv:1511.07821v1 [stat.AP])

Firm size data usually do not show the normality that is often assumed in statistical analysis such as regression analysis. In this study we focus on two firm size data: the number of employees and sale. Those data deviate considerably from a normal distribution. To improve the normality of those data we transform them by the Box-Cox transformation with appropriate parameters. The Box-Cox transformation parameters are determined so that the transformed data best show the kurtosis of a normal distribution. It is found that the two firm size data transformed by the Box-Cox transformation show strong linearity. This indicates that the number of employees and sale have the similar property as a firm size indicator. The Box-Cox parameters obtained for the firm size data are found to be very close to zero. In this case the Box-Cox transformations are approximately a log-transformation. This suggests that the firm size data we used are approximately log-normal distributions.

#### Two Countermeasures Against Hardware Trojans Exploiting Non-Zero Aliasing Probability of BIST. (arXiv:1511.07792v1 [cs.CR])

The threat of hardware Trojans has been widely recognized by academia, industry, and government agencies. A Trojan can compromise security of a system in spite of cryptographic protection. The damage caused by a Trojan may not be limited to a business or reputation, but could have a severe impact on public safety, national economy, or national security. An extremely stealthy way of implementing hardware Trojans has been presented by Becker et al. at CHES'2012. Their work have shown that it is possible to inject a Trojan in a random number generator compliant with FIPS 140-2 and NIST SP800-90 standards by exploiting non-zero aliasing probability of Logic Built-In-Self-Test (LBIST). In this paper, we present two methods for modifying LBIST to prevent such an attack. The first method makes test patterns dependent on a configurable key which is programed into a chip after the manufacturing stage. The second method uses a remote test management system which can execute LBIST using a different set of test patterns at each test cycle.

#### A Distributed System for Storing and Processing Data from Earth-observing Satellites: System Design and Performance Evaluation of the Visualisation Tool. (arXiv:1511.07693v1 [cs.DC])

We present a distributed system for storage, processing, three-dimensional visualisation and basic analysis of data from Earth-observing satellites. The database and the server have been designed for high performance and scalability, whereas the client is highly portable thanks to having been designed as a HTML5- and WebGL-based Web application. The system is based on the so-called MEAN stack, a modern replacement for LAMP which has steadily been gaining traction among high-performance Web applications. We demonstrate the performance of the system from the perspective of an user operating the client.

#### Approximate Probabilistic Inference via Word-Level Counting. (arXiv:1511.07663v1 [cs.AI])

Hashing-based model counting has emerged as a promising approach for large-scale probabilistic inference on graphical models. A key component of these techniques is the use of xor-based 2-universal hash functions that operate over Boolean domains. Many counting problems arising in probabilistic inference are, however, naturally encoded over finite discrete domains. Techniques based on bit-level (or Boolean) hash functions require these problems to be propositionalized, making it impossible to leverage the remarkable progress made in SMT (Satisfiability Modulo Theory) solvers that can reason directly over words (or bit-vectors). In this work, we present the first approximate model counter that uses word-level hashing functions, and can directly leverage the power of sophisticated SMT solvers. Empirical evaluation over an extensive suite of benchmarks demonstrates the promise of the approach.

#### Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems. (arXiv:1511.07658v1 [cs.DC])

The High Performance Computing (HPC) field is witnessing a widespread adoption of Graphics Processing Units (GPUs) as co-processors for conventional homogeneous clusters. The adoption of prevalent Single- Program Multiple-Data (SPMD) programming paradigm for GPU-based parallel processing brings in the challenge of resource underutilization, with the asymmetrical processor/co-processor distribution. In other words, under SPMD, balanced CPU/GPU distribution is required to ensure full resource utilization. In this paper, we propose a GPU resource virtualization approach to allow underutilized microprocessors to effi- ciently share the GPUs. We propose an efficient GPU sharing scenario achieved through GPU virtualization and analyze the performance potentials through execution models. We further present the implementation details of the virtualization infrastructure, followed by the experimental analyses. The results demonstrate considerable performance gains with GPU virtualization. Furthermore, the proposed solution enables full utilization of asymmetrical resources, through efficient GPU sharing among microprocessors, while incurring low overhead due to the added virtualization layer.

#### Towards A Marketplace for Mobile Content: Dynamic Pricing and Proactive Caching. (arXiv:1511.07573v1 [cs.GT])

In this work, we investigate the profit maximization problem for a wireless network carrier and the payment minimization for end-users. Motivated by recent findings on proactive resource allocation, we focus on the scenario whereby end-users who are equipped with device-to-device (D2D)communication can harness predictable demand in proactive data contents caching and the possibility of trading their proactive downloads to minimize their expected payments. The carrier, on the other hand, utilizes a dynamic pricing scheme to differentiate between off-peak and peak time prices and applies commissions on each trading process to further maximize its profit. A novel marketplace that is based on risk sharing between end-users is proposed where the tension between carrier and end-users is formulated as a Stackelberg game. The existence and uniqueness of the non-cooperative sub-game Nash equilibrium is shown. Furthermore, we explore the equilibrium points for the case when the D2D is available and when it is not available, and study the impact of the uncertainty of users future demands on the system's performance. In particular, we compare the new equilibrium with the baseline scenario of flat pricing. Despite end-users connectivity with each other, the uncertainty of their future demands, and the freshness of the pre-cached contents, we characterize a new equilibrium region which yields to a win-win situation with respect to the baseline equilibrium. We show that end-users activity patterns can be harnessed to maximize the carrier's profit while minimizing the end-users expected payments.

#### A Symbolic Logic with Concrete Bounds for Cryptographic Protocols. (arXiv:1511.07536v1 [cs.LO])

We present a formal logic for quantitative reasoning about security properties of network protocols. The system allows us to derive concrete security bounds that can be used to choose key lengths and other security parameters. We provide axioms for reasoning about digital signatures and random nonces, with security properties based on the concrete security of signature schemes and pseudorandom number generators (PRG). The formal logic supports first-order reasoning and reasoning about protocol invariants, taking concrete security bounds into account. Proofs constructed in our logic also provide conventional asymptotic security guarantees because of the way that concrete bounds accumulate in proofs. As an illustrative example, we use the formal logic to prove an authentication property with concrete bounds of a signature-based challenge-response protocol.

#### The Limitations of Deep Learning in Adversarial Settings. (arXiv:1511.07528v1 [cs.CR])

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

#### Resource Allocation and Outage Analysis for An Adaptive Cognitive Two-Way Relay Network. (arXiv:1511.07469v1 [cs.IT])

In this paper, an adaptive two-way relay cooperation scheme is studied for multiple-relay cognitive radio networks to improve the performance of secondary transmissions. The power allocation scheme is derived to minimize the secondary outage probability. We also provide best relay selection for the two-way relay network using a max-min criterion. Exact closed-form expressions of secondary outage probability are derived under a constraint on the quality of service of primary transmissions in terms of the required primary outage probability. To better understand the impact of primary user interference on secondary transmissions, we further investigate the asymptotic behavior of the secondary relay network when the primary signal-to-noise ratio goes to infinity, including power allocation and outage probability. Simulation results are provided to illustrate the performance of the proposed schemes.

#### Minimizing Total Busy Time for Energy-Aware Virtual Machine Allocation Problems. (arXiv:1511.07423v1 [cs.DC])

This paper investigates the energy-aware virtual machine (VM) allocation problems in clouds along characteristics: multiple resources, fixed interval time and non-preemption of virtual machines. Many previous works have been proposed to use a minimum number of physical machines, however, this is not necessarily a good solution to minimize total energy consumption in the VM placement with multiple resources, fixed interval time and non-preemption. We observed that minimizing the sum of total busy time of all physical machines implies minimizing total energy consumption of physical machines. In addition to, if mapping of a VM onto physical machines have the same total busy time then the best mapping has physical machine's remaining available resource minimizing. Based on these observations, we proposed heuristic-based EM algorithm to solve the energy-aware VM allocation with fixed starting time and duration time. In addition, this work studies some heuristics for sorting the list of virtual machines (e.g., sorting by the earliest starting time, or latest finishing time, or the longest duration time first, etc.) to allocate VM. We evaluate the EM using CloudSim toolkit and jobs log-traces in the Feitelson's Parallel Workloads Archive. Simulation's results show that all of EM-ST, EM-LFT and EM-LDTF algorithms could reduce total energy consumption compared to state-of-the-art of power-aware VM allocation algorithms. (e.g. Power-Aware Best-Fit Decreasing (PABFD) [7])).

### StackOverflow

#### python classification without having to impute missing values

I have a dataset that is working nicely in weka. It has a lot of missing values represented by '?'. Using a decision tree, I am able to deal with the missing values.

However, on sci-kit learn, I see that the estimators can't used with data with missing values. Is there an alternative library I can use instead that would support this?

Otherwise, is there a way to get around this in sci-kit learn?

### CompsciOverflow

#### Approximate Subset Sum with negative numbers

I am interested in the approximation version of the Subset Sum problem with negative numbers. Wikipedia says there is an FPTAS algorithm for SS. That Wikipedia page states:

If all numbers are non-negative, the approximate subset sum is solvable in time polynomial in N and 1/c.

Similarly, in the CLRS version of the algorithm, the numbers are required to be positive. For the exact version there is this reduction that makes all numbers positive but I believe that it cannot be applied to the approximation version. Because the approximation algorithm behaves differently for larger numbers it would not give the same results if instead of -10 I have 10000. The trim function would cut a lot more numbers in the new, all positive version than in the original.

My question: Is there a FTPAS for General-SS, as defined below?

General-SS:
Given an input set $S=\{a_1,\dots,a_n\}$, where $a_i$ are possibly negative numbers, and a target $C$ find a subset $S'\subseteq S$ that sums to $C$.

All-positive-SS (defined below) has an FTPAS. Does this algorithm remains an FPTAS for General-SS?

All-positive-SS:
The same as General-SS, but where we restrict $a_i\geq 0$.

Alternatively, if we transform General-SS to All-positive-SS with the above reduction and then run the FPTAS algorithm for the new instance, does this also give an FTPAS for General-SS?

### StackOverflow

In my Calc Function i'm trying to add a stored “memory” number. At any time, the usercan type “store”, which will store the current total. At any later time, the user can type “retrieve”,and the total will change to the stored total:

import Data.Char
import Data.String

calc :: IO ()
calc = calch 0

calch :: Float -> IO()
calch y x  = do
putStrLn $"Total is " ++ show x input <- getLine calcProcessor y x (map toLower input) calcProcessor :: Float -> String -> IO () calcProcessor y x input | input == "quit" = return () | input == "store" = calch (x) 0 | input == "retrieve" = calch 0 (y) | head input == '+' = calch y (x + read (drop 2 input)) | head input == '*' = calch y (x * read (drop 2 input)) | input == "reciprocal" = calch y (1/x)  I get a very long error saying: 9.hs:8:1: Couldn't match expected type ‘String -> IO b0’ with actual type ‘IO ()’ The equation(s) for ‘calch’ have two arguments, but its type ‘Float -> IO ()’ has only one 9.hs:11:5: Couldn't match expected type ‘[Char] -> IO b0’ with actual type ‘IO ()’ The function ‘calcProcessor’ is applied to three arguments, but its type ‘Float -> String -> IO ()’ has only two In a stmt of a 'do' block: calcProcessor y x (map toLower input) In the expression: do { putStrLn$ "Total is " ++ show x;
input <- getLine;
calcProcessor y x (map toLower input) }

9.hs:15:1:
Couldn't match expected type ‘[Char] -> m0 ()’
with actual type ‘IO ()’
The equation(s) for ‘calcProcessor’ have three arguments,
but its type ‘Float -> String -> IO ()’ has only two

9.hs:17:36:
Couldn't match expected type ‘Integer -> m0 ()’
with actual type ‘IO ()’
The function ‘calch’ is applied to two arguments,
but its type ‘Float -> IO ()’ has only one
In the expression: calch (x) 0
In an equation for ‘calcProcessor’:
calcProcessor y x input
| input == "quit" = return ()
| input == "store" = calch (x) 0
| input == "retrieve" = calch 0 (y)
| head input == '+' = calch y (x + read (drop 2 input))
| head input == '*' = calch y (x * read (drop 2 input))
| input == "reciprocal" = calch y (1 / x)

9.hs:17:43:
Couldn't match type ‘[Char]’ with ‘Float’
Expected type: Float
Actual type: String
In the first argument of ‘calch’, namely ‘(x)’
In the expression: calch (x) 0

9.hs:18:36:
Couldn't match expected type ‘Float -> m0 ()’
with actual type ‘IO ()’
The function ‘calch’ is applied to two arguments,
but its type ‘Float -> IO ()’ has only one
In the expression: calch 0 (y)
In an equation for ‘calcProcessor’:
calcProcessor y x input
| input == "quit" = return ()
| input == "store" = calch (x) 0
| input == "retrieve" = calch 0 (y)
| head input == '+' = calch y (x + read (drop 2 input))
| head input == '*' = calch y (x * read (drop 2 input))
| input == "reciprocal" = calch y (1 / x)

9.hs:19:36:
Couldn't match expected type ‘String -> m0 ()’
with actual type ‘IO ()’
The function ‘calch’ is applied to two arguments,
but its type ‘Float -> IO ()’ has only one
In the expression: calch y (x + read (drop 2 input))
In an equation for ‘calcProcessor’:
calcProcessor y x input
| input == "quit" = return ()
| input == "store" = calch (x) 0
| input == "retrieve" = calch 0 (y)
| head input == '+' = calch y (x + read (drop 2 input))
| head input == '*' = calch y (x * read (drop 2 input))
| input == "reciprocal" = calch y (1 / x)

9.hs:20:36:
Couldn't match expected type ‘String -> m0 ()’
with actual type ‘IO ()’
The function ‘calch’ is applied to two arguments,
but its type ‘Float -> IO ()’ has only one
In the expression: calch y (x * read (drop 2 input))
In an equation for ‘calcProcessor’:
calcProcessor y x input
| input == "quit" = return ()
| input == "store" = calch (x) 0
| input == "retrieve" = calch 0 (y)
| head input == '+' = calch y (x + read (drop 2 input))
| head input == '*' = calch y (x * read (drop 2 input))
| input == "reciprocal" = calch y (1 / x)

9hs:21:36:
Couldn't match expected type ‘String -> m0 ()’
with actual type ‘IO ()’
The function ‘calch’ is applied to two arguments,
but its type ‘Float -> IO ()’ has only one
In the expression: calch y (1 / x)
In an equation for ‘calcProcessor’:
calcProcessor y x input
| input == "quit" = return ()
| input == "store" = calch (x) 0
| input == "retrieve" = calch 0 (y)
| head input == '+' = calch y (x + read (drop 2 input))
| head input == '*' = calch y (x * read (drop 2 input))
| input == "reciprocal" = calch y (1 / x)


I am assuming it is a type error. I know the code works fine without

 | input == "store"             = calch (x) 0
| input == "retrieve"          = calch 0 (y)


so i know that the problem is in there, but i'm not sure on how to fix it.

#### I need to fix this program but I don't know where to start

I need this code to ask for seven inputs for each day of the week then find the day of the week that has the min and max and display the day with it.

def main():
printWelcome()
sales = []
days = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']
index = 0
total = 0
for days in index:
sale = int(input('Enter Sales for '))
sale.append(sales)
index = index + 1
maxsales = max(sales)
#try and find the index number with the maximum sales
setDay()
print('Maximum sales of',maxsales,'happened on',)
#repeat steps 5-8
minsales = min(sales)
print('Minimum sales of',minsales,'happened on',)
findtotal(sales)
for value in sales:
total += int(value)
average = total / len(sales)
print('The average of the sales is'+ format(average,".2f"))
print('The total sales for the week are'+ format(total,".2f"))
printCommission(sales)
def printWelcome():
print('The Widgets R Us Sales Calculator Program')
def setDay():
print("")

def findTotal(sales1):
total = 0.0
for value in sales1:
total += int(value)
def printCommission(sales1):
if sales1 <= 100.00:
print("Sales too low for a commission. Must earn more than $100") else: if sales1 >= 100.00 and sales1 <= 250.00: print("Sales commission for this week is$25.00")
else:
if sales1 >=250.00 and sales1 <= 500.00:
print("Sales commission for this week is $30.00") else: if sales1 > 500.00: print("Sales commision for this week is$40.00")
main()


### QuantOverflow

#### Match different option high frequency databases

I downloaded the “E-mini S&P 500 (Dollar) Options for 1/10/11” Top-of-Book (BBO) data. If you are interested you may download the data from the following link (approx. 80MB zipped and 1GB unzipped)

http://www.cmegroup.com/market-data/datamine-historical-data/files/XCME_ES_OPT_110110.zip

The data fields are described in the Best Bid Best Offer ASCII Layout table which can be found in the following link

http://www.cmegroup.com/confluence/display/EPICSANDBOX/Best+Bid+Best+Offer+ASCII+Layout

The basic idea is that each observation contains information about the Best Bid (highest buying quote) or Best Offer (Ask – the lowest selling quote) or the last trade.

I don’t understand the meaning of some of the Data Fields. I extensively searched the CME website but couldn’t find any answer. The data fields that puzzle me are the following

1. Close/Open Type: Indicator for Open ( O ) / Close ( C )
2. Valid Open Exception: Indicator for Special Open ( O )
3. Post Close: Indicator for prices traded after the market close ( P )
4. Insert Code Type: Indicator for Inserted prices ( I )
5. Fast/Late Indicator: Indicator for Fast/Late Market ( F )
6. Cabinet Indicator: Indicator for cabinet trades ( $) 7. Book Indicator: Indicator for Book quotes ( B ) The reason I want to understand these fields is because I am trying to match this database with similar database provided by the Options Price Reporting Authority (OPRA). The data were obtained by TickData. The documentation may be found in the following link https://s3-us-west-2.amazonaws.com/tick-data-s3/pdf/TickData_File_Format_Overview_US_Options.pdf The basic idea about the TickData data is that each observation contains for each time the best bid/best offer or trade. In addition it provides the OPRA Condition Code for Quote and Trade Records. I hope that if I understand the above data field indicators I will be able to match them with the OPRA Condition Codes. ### Halfbakery #### PenAgain Stylus (0.0) ### HN Daily #### Daily Hacker News for 2015-11-24 ### Planet Theory #### On the Computational Complexity of Limit Cycles in Dynamical Systems Authors: Christos H. Papadimitriou, Nisheeth K. Vishnoi Download: PDF Abstract: We study the Poincare-Bendixson theorem for two-dimensional continuous dynamical systems in compact domains from the point of view of computation, seeking algorithms for finding the limit cycle promised by this classical result. We start by considering a discrete analogue of this theorem and show that both finding a point on a limit cycle, and determining if a given point is on one, are PSPACE-complete. For the continuous version, we show that both problems are uncomputable in the real complexity sense; i.e., their complexity is arbitrarily high. Subsequently, we introduce a notion of an "approximate cycle" and prove an "approximate" Poincar\'e-Bendixson theorem guaranteeing that some orbits come very close to forming a cycle in the absence of approximate fixpoints; surprisingly, it holds for all dimensions. The corresponding computational problem defined in terms of arithmetic circuits is PSPACE-complete. #### Tradeoffs for nearest neighbors on the sphere Authors: Thijs Laarhoven Download: PDF Abstract: We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity$n^{\rho_q}$and update complexity$n^{\rho_u}$for data sets of size$n$is given by the following equation in terms of the approximation factor$c$and the exponents$\rho_q$and$\rho_u$: $$c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}.$$ For small$c=1+\epsilon$, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity$n^{1-4\epsilon^2}$. Balancing the query and update costs leads to optimal complexities$n^{1/(2c^2-1)}$, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity$n^{o(1)}$can be achieved at the cost of a space complexity of the order$n^{1/(4\epsilon^2)}$, matching the bound$n^{\Omega(1/\epsilon^2)}$of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large$c$, minimizing the update complexity results in a query complexity of$n^{2/c^2+O(1/c^4)}$, improving upon the related exponent for large$c$of [Kapralov, PODS'15] by a factor$2$, and matching the bound$n^{\Omega(1/c^2)}$of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities$n^{1/(2c^2-1)}$, while a minimum query time complexity can be achieved with update complexity$n^{2/c^2+O(1/c^4)}$, improving upon the previous best exponents of Kapralov by a factor$2$. ## November 24, 2015 ### StackOverflow #### help organizing my data for this machine learning problem I want to categorize tweets within a given set of categories like {'sports', 'entertainment', 'love'}, etc... My idea is to take the term frequencies of the most commonly used words to help me solve this problem. For example, the word 'love' shows up most frequently in the love category but it also shows up in sports and entertainment in the form of "I love this game" and "I love this movie". To solve it, I envisioned a 3-axis graph where the x values are all the words used in my tweets, the y values are the categories, and the z values are the term frequencies (or some type of score) with the respect to the word and the category. I would then break up the tweet onto the graph and then add up the z values within each category. The category with the highest total z value is most likely the correct category. I know this is confusing, so let me give an example: The word 'watch' shows up a lot in sports and entertainment ("I am watching the game" and "I am watching my favorite show") ...Therefore, I narrowed it down to these two categories at the least. But the word 'game' does not show up often in entertainment and show does not show up often in sports. the Z value for 'watch' + 'game' will be highest for the sports category and 'watch' + 'show' will be highest for entertainment. Now that you understand how my idea works, I need help organizing this data so that a machine learning algorithm can predict categories when I give it a word or set of words. I've read a lot about SVMs and I think they're the way to go. I tried libsvm, but I can't seem to come up with a good input set. Also, libsvm does not support non-numeric values, which is adding more complexity. Any ideas? Do I even need a library, or should I just code up the decision-making myself? Thanks all, I know this was long, sorry. ### TheoryOverflow #### Shrinkage Exponent of Formulas over the Full Binary Basis Håstad has shown that the shrinkage exponent of boolean formulas over the De Morgan basis is 2. In other words, if one keeps each variable of the formula alive with probability$p$and restricts it with a uniform random bit otherwise, then the resulting formula can be shrunk by a factor of roughly$p^2$in such a way that it computes the same function. This fact can be used to prove a$\Omega(n^2)$lower obund for the parity on$n$bits, and a$\Omega(n^{3-o(1)})$lower bound for Andreev's function. I have the following questions: 1. The shrinkage exponent of a formula over the full binary basis is 1 (consider for instance the parity function). Is there some relaxation of the notion of shrinkage exponent which is greater than 1 for formulas over the full binary basis? 2. Has such a relaxation of the notion of shrinkage exponent been used to obtain lower bounds? ### Lobsters #### sometimes syscalls restart #### MagSpoof - credit card/magstripe spoofer #### The end of dynamic languages? ### Fefe #### Gute Nachrichten! Wir haben die Rechte Hand bin Ladens ... Gute Nachrichten! Wir haben die Rechte Hand bin Ladens erwischt!1!! Ein Al-Nusra-Selbstmordbomber hat das Führungspersonal einer ISIS-Miliz in Syrien ermordet. ### DataTau #### Roles and Responsibilities of a Successful Analytics Team ### AWS #### New AWS Quick Start – Sitecore Sitecore is a popular enterprise content management system that also includes a multi-channel marketing automation component with an architecture that is a great fit for the AWS cloud! It allows marketers to deliver a personalized experience that takes into account the customers’ prior interaction with the site and the brand (they call this feature Context Marketing). Today we are publishing a new Sitecore Quick Start Reference Deployment. This 19-page document will show you how to build an AWS cluster that is fault-tolerant and highly scalable. It builds on the information provided in the Sitecore Scaling Guide and recommends an architecture that uses the Amazon Relational Database Service (RDS), Elastic Load Balancing, and Auto Scaling. Using the AWS CloudFormation template referenced in the Quick Start, you can launch Sitecore into a Amazon Virtual Private Cloud in a matter of minutes. The template creates a fully functional deployment of Sitecore 7.2 that runs on Windows Server 2012 R2. The production configuration runs in two Availability Zones: You can use the template as-is, or you can copy it and then modify it as you see fit. If you decide to do this, the new CloudFormation Visual Designer may be helpful: The Quick Start includes directions for setting up a test server along with some security guidelines. It also discusses the use of Amazon CloudFront to improve site performance and AWS WAF to help improve application security. Jeff; ### Planet Theory #### Congratulations, Dr. Lam! Jenny Lam, my teaching assistant for algorithms this quarter, passed today her thesis defense. She has been working with Sandy Irani, primarily on online algorithms for replacement and memory management strategies in heterogeneous caches (such as web proxies that have to maintain copies of documents of widely varying sizes), and has three papers on that subject, one at ALENEX and two at Middleware. She also has another paper on security-related algorithms in submission, which I'm also a co-author on. My understanding is that she will be teaching next semester at Pomona College on a temporary basis, while she looks for a more permanent position. Congratulations! ### StackOverflow #### Functional way to add to Lists that are Class-Members I want to sort items of a class and collect them in Collection-Classes that beside a List-Member also contain further information that are necessary for the sorting process. The following example is a a very simplified example for my problem. Although it doesn't make sense, I hope it still can help to understand my Question. type ItemType = Odd|Even //realworld: more than two types possible type Item(number) = member this.number = number member this.Type = if (this.number % 2) = 0 then Even else Odd type NumberTypeCollection(numberType:ItemType , ?items:List<Item>) = member this.ItemType = numberType member val items:List<Item> = defaultArg items List.empty<Item> with get,set member this.append(item:Item) = this.items <- item::this.items let addToCollection (collections:List<NumberTypeCollection>) (item:Item) = let possibleItem = collections |> Seq.where (fun c -> c.ItemType = item.Type) //in my realworld code, several groups may be returned |> Seq.tryFind(fun _ -> true) match possibleItem with |Some(f) -> f.append item collections |None -> NumberTypeCollection(item.Type, [item]) :: collections let rec findTypes (collections:List<NumberTypeCollection>) (items:List<Item>) = match items with | [] -> collections | h::t -> let newCollections = ( h|> addToCollection collections) findTypes newCollections t let items = [Item(1);Item(2);Item(3);Item(4)] let finalCollections = findTypes List.empty<NumberTypeCollection> items  I'm unsatisfied with the addToCollection method, since it requires the items in NumberTypeCollection to be mutual. Maybe there are further issues. What can be a proper functional solution to solve this issue? Edit: I'm sorry. May code was too simplified. Here is a little more complex example that should hopefully illustrate why I chose the mutual class-member (although this could still be the wrong decision): open System type Origin = Afrika|Asia|Australia|Europa|NorthAmerika|SouthAmerica type Person(income, taxrate, origin:Origin) = member this.income = income member this.taxrate = taxrate member this.origin = origin type PersonGroup(origin:Origin , ?persons:List<Person>) = member this.origin = origin member val persons:List<Person> = defaultArg persons List.empty<Person> with get,set member this.append(person:Person) = this.persons <- person::this.persons //just some calculations to group people into some subgroups let isInGroup (person:Person) (personGroup:PersonGroup) = let avgIncome = personGroup.persons |> Seq.map (fun p -> float(p.income * p.taxrate) / 100.0) |> Seq.average Math.Abs ( (avgIncome / float person.income) - 1.0 ) < 0.5 let addToGroup (personGroups:List<PersonGroup>) (person:Person) = let possibleItem = personGroups |> Seq.where (fun p -> p.origin = person.origin) |> Seq.where (isInGroup person) |> Seq.tryFind(fun _ -> true) match possibleItem with |Some(f) -> f.append person personGroups |None -> PersonGroup(person.origin, [person]) :: personGroups let rec findPersonGroups (persons:List<Person>) (personGroups:List<PersonGroup>) = match persons with | [] -> personGroups | h::t -> let newGroup = ( h|> addToGroup personGroups) findPersonGroups t newGroup let persons = [Person(1000,20, Afrika);Person(1300,22,Afrika);Person(500,21,Afrika);Person(400,20,Afrika)] let c = findPersonGroups persons List.empty<PersonGroup>  What I may need to emphasize: There can be several different groups with the same origin. ### Lobsters #### A map of one million scientific papers from the arXiv ### UnixOverflow #### Can I jail an older 32-bit FreeBSD on a (current) 64-bit FreeBSD? My question is actually pretty much in the title. Is it possible to jail an older 32-bit FreeBSD, such as 6.4 or 8.4 in a 64-bit FreeBSD 10.2? I'll also appreciate pointers and explanations on how to accomplish this and information on what prerequisites my host needs to fulfill. NB: according to this blog article, jailing an older FreeBSD on a current one is possible. But that article makes no mention of 32-bit versus 64-bit. ### Lobsters #### Machine learning and AI: The difference between knowledge and action #### Supported Source - Great Software Projects Deserve Great Support ### QuantOverflow #### Calibrating stochastic volatility model from price history (not option prices) For stochastic volatility models like Heston, it seems like the standard approach is to calibrate the models from option prices. This seems a bit like a chicken and an egg problem -- wouldn't we prefer a model, based only on historical data, that we can use to price options? I don't see that as frequently. For the Heston model, I see the method of maximum liklihood used to calibrate against historical data. However, this requires that the conditional probability distribution be derived (which for the Heston model is widely available). For other more complicated models, this seems to get more complicated. Are there other approaches, other than maximum likelihood or fit to observed options prices, used to calibrate stochastic volatility models? ### Fefe #### WTO: Nein, ihr dürft nicht auf eure Thunfisch-Dosen ... WTO: Nein, ihr dürft nicht auf eure Thunfisch-Dosen draufschreiben, dass der delphinshonend gefangen wurde. Dieses Label ist in den USA gesetzlich vorgeschrieben, um das Beifang-Abschlachten der Delphine zu verhindern. Das Label wäre eine Verletzung des Freihandelsrechte mexikanischer Fischer und würde deren Profite gefährden. Nur falls jemandem noch nicht klar war, was TTIP und co so für Auswirkungen haben werden. #### Kurze Durchsage von Donald Trump:Even if it doesn't ... Kurze Durchsage von Donald Trump: Even if it doesn't work they deserve it Wer hätte gedacht, dass es jemanden gibt, der noch schlimmer als Bush Junior wäre als Präsident! #### Es ist ein Video aufgetaucht, auf dem man sieht, wie ... Es ist ein Video aufgetaucht, auf dem man sieht, wie die Free Syrian Army den russischen Rettungshubschrauber zerstört. Allerdings war er schon gelandet an der Stelle. Das Gerät, mit dem sie ihn kaputtmachen, ist eine sogenannte TOW, Wikipedia dazu. Money quote: A sudden influx of TOWs were supplied in May 2015, mostly to Free Syrian Army affiliated factions, but also independent Islamist battalions; as a requirement of being provided TOWs, these Syrian opposition groups are required to document the use of the missiles by filming their use, and are also required to save the spent missile casings.[24] Groups provided with TOWs include the Hazzm Movement, the 13th Division, 1st Coastal Division, Syria Revolutionaries Front, Yarmouk Army, Knights of Justice Brigade, and the 101st Division.[25] Free Syrian Army battalions widely and decisively used TOWs in the 2015 Jisr al-Shughur offensive.[26][27][28] Mit anderen Worten: Das sind Waffen, die die USA den "Rebellen" gegeben haben. ### TheoryOverflow #### How to write the maximum length word in a language? For language$L$, as produced from a CFG, a finite automata or similar, how can I express the length of the longest word? I know I can write$|L|$for the length of all words in$L$. I thought I could write$\lceil L \rceil$for the maximum. Do I need to write$\lceil |L| \rceil$perhaps? Thank you, #### How to constrain a finite automaton (NFA and DFA) to a tree? I have a finite automaton by the standard model Hopcroft & Ullman define: $$M = (Q, \Sigma, \delta, q_0, F)$$ Where$\delta$is the transition function mapping$Q \times \Sigma \mapsto Q$, such that$\delta(q, a)$is a state for each state$q \in Q$, the set of all states, and input symbol$a \in \Sigma$, the alphabet. That allows for$\delta$to map to any element of$Q$. So that's a graph, although it's not described using the usual$G = (V, E)$notation. Without specifying any particular definition for$\delta$, I'd like to be able to write the constraint that$\delta$may only define transitions which form a tree. How can that be expressed? My thought is that I might say that$\delta$must be recursive somehow (to give a tree shape), but I'm not sure how to go about that. Thank you, ### DragonFly BSD Digest #### DragonFly default linker switched to gold The default linker in DragonFly has been switched to gold, the newer version of ld. (get it, go-ld?) It’s faster, cleaner, going by the commit message. It’s possible to switch back to the old one if needed. This predates the recent branch for 4.4, so it will be default in the release, too. ### StackOverflow #### Deeplearning4j Splitting datasets for test and train Deeplearning4j has functions to support splitting datasets into test and train, as well as mechanisms for shuffling datasets, however as far as I can tell either they don't work or I'm doing something wrong. Example:  DataSetIterator iter = new IrisDataSetIterator(150, 150); DataSet next = iter.next(); // next.shuffle(); SplitTestAndTrain testAndTrain = next.splitTestAndTrain(120, new Random(seed)); DataSet train = testAndTrain.getTrain(); DataSet test = testAndTrain.getTest(); for (int i = 0; i < 30; i++) { String features = test.getFeatures().getRow(i).toString(); String actual = test.getLabels().getRow(i).toString().trim(); log.info("features " + features + " -> " + actual ); }  Results in the last 30 rows of the input dataset returned, the Random(seed) parameter to splitTestAndTrain seems to have been ignored completely. If instead of passing the random seed to splitTestAndTrain I instead uncomment the next.shuffle() line, then oddly the 3rd and 4th features get shuffled while maintaining the existing order for the 1st and 2nd features as well as the test label, which is even worse than not sorting the input at all. So... the question is, am I using it wrong, or is Deeplearning4j just inherently broken? Bonus question: if Deeplearning4j is broken for something as simple as generating test and sample datasets, should it be trusted with anything at all? Or would I be better off using a different library? ### CompsciOverflow #### Disjoint paths of length at most L and the number of nodes to remove to vanish this property - Inequality Given an graph$G$and two, randomly chosen, non-adjacent nodes, let's call them$s$and$t$, we make the following two notations:$P_L(s,t;G)$= the maximum number of internally disjoint paths (paths that don't have any node in common, except s and t) with a length at most$L$(L ranges from 1 to |G|).$K_L(s,t;G)$= the minimum size of a set of nodes (s and t not included in the sets) with this property - if we remove the set of nodes, the result is that there are no more paths of length at most$L$, from$s$to$t$. I have to prove the inequality $$P_L(s,t;G) \le K_L(s,t;G)$$ and specify the values of$L$(L ranging from 2 to |G|) for which the relation satisfies the equality too, for any chosen graph G and nodes s,t. The first idea that comes to my mind is to start a discussion based on the values of$L$and try to prove the relation by induction. Any ideas how? ### Lobsters #### Meeting Survival Checklist. I put this up on the companies internal wiki a few years ago… Meeting Survival Checklist. It’s no secret. I HATE meetings. They are the very bane of my existence. It’s been so for decades. Give me an Alpine meadow with not a single person within a days walk and I’ll happily keep my own company for weeks. Play Google calendar’s meeting notification sound and my stomach clenches in misery. If somebody would offer me a job without meetings, I’d take it tomorrow. But meetings seem to be an inevitable plague on my life… so as a New Year’s rethink on how to cope with this, here is my “2013 meeting survival checklist”. Checklist. 1. Walk over and see if you can resolve the issue without a meeting. 2. Is a meeting the appropriate mechanism here? (Email, walk over and chat, telephone, …) 3. What is the purpose of this meeting? There is only one acceptable purpose for a meeting… Decide and Commit. 4. What is the agenda? What decision must be reached by the end of the meeting? 5. Can I decline giving a reason? (eg. Nothing to contribute, delegating vote to proxy, …) 6. Can I cancel the meeting? 1. Just because it’s a regularly scheduled meeting, it doesn’t mean we need to have this one. 2. Do we still need it? 3. Is everybody ready? 4. Do we have the right people? (Can we reach a decision without X?) 7. Am I prepared?(Do we have all the information require to decide?) 1. Schedule a slot in calendar to do preparation. 2. Accept or decline meeting in google calendar. (Or you don’t get appropriate reminders.) 3. Do attendees have all relavent information? 4. Room booked? 5. Book meeting room early to set up laptop / projector etc. 8. Check calendar for the day for back-to-back meetings. 9. Schedule coffee / bathroom breaks if need be. 10. Bring interesting healthy food. 11. Take action minutes • Use agenda as template… drive hard at concrete actions and on moving to the next item on the agenda. 12. Make your point an absolute maximum of two times. • Action is a much more effective conflict resolution method than debate. 13. Suppress noisy contributers. (Whinge at them when they have said the same thing for the third time.) 14. Reward Silent One contributions with acknowledgement and acceptance. 15. Look at the following image until perspective returns. http://apod.nasa.gov/apod/ap060321.html 16. Get up and leave. So what’s wrong with meetings anyway? Some love them, some hate them. Some people positively enjoy meetings. They are nice chatty social occasions to them. An opportunity to display their wit and enjoy the wit of friends, socialize with peers, climb the corporate ladder, advance their personal objectives, get things decided. Typically such people are extroverts. Unfortunately typically extroverts call meetings blissfully unaware that they are inflicting deep psychological pain on their introverted colleagues. Meetings are broadcast medium. In a meeting, only one person can be speaking at a time… which is grreat if there is something everyone needs to hear. Very wasteful compared to mutiple pairs of conversation. Speech is slow. Introverts often get their entertainment from reading. Which tends to mean they are GOOD at reading. Which means they can read several times FASTER than you can speak. ie. If you want to broadcast something to introverts… email them. Post on the internal wiki, but for pity sake don’t bore them to tears painfully slowly SPEAKING. A 1 hour N-Person Meeting chews N man hours. If there are N people in an hour long meeting… you better get N hours of value for the company. Contributions to social interactions are notoriously unevenly distributed. Observe any social interaction, in any medium you choose…. The “noisiest” person will contribute significantly more words, posts, documents, comments, whatever than the the next most chatty person. This is not a problem, it is the nature of humanity. However in a dysfunctional meeting, this hierarchy of contribution is often so steep… that a significant number of attendees are completely silent. So in a meeting that isn’t a “broadcast information” meeting… the meeting would run exactly the same, and come to exactly the same conclusion if those silent ones weren’t there. In which case, it is obviously a waste of time forcing those silent ones to attend. Meeting Survival Guide Don’t have that meeting! Get up, walk over and have a chat. Send an email. Odds on you don’t need to have a meeting. Do you need a meeting? So when should you have a meeting? 1. When everybody is a Happy Shiny Extrovert and would enjoy it! (Hey, if everybody is loving it… why not!) 2. When things are falling into the gaps between people’s responsibility. 3. When a rapid to and fro discussion between parties resulting in a decision is needed. 4. When you need to broadcast information AND gain immediate feedback about how people FEEL about it. (Note the emphasis on FEEL, if you want to know what people THINK about something, don’t have a meeting. Give them time to think.) Have a clear purpose for the meeting, and set an Agenda. Write down in one sentence what you hope to have achieved by the end of the meeting. Put that in the calendar. If you can’t do that… for pity sake cancel the meeting! Write down what will be covered in the meeting. If you expect a decision, make very sure people have the chance to do the appropriate preparation. Check they have done it and postpone if need be. Otherwise expect a frustrating talk-fest that goes nowhere and concludes in “we don’t know, we will have to investigate”. Decline that meeting. We’re friendly, hardworking and polite folk. Somebody asks for a meeting… we say “Yeah sure.” Don’t. Push back. If you cannot see a clear purpose for the meeting or an agenda… ask for one. If you know you don’t have anything valuable to contribute… DECLINE IT WITH A COMMENT! eg. “Sorry, I don’t think I have anything valuable to add here… I will go with whatever you decide.” Don’t just not attend. That leaves everyone else hanging, “We can’t decide on X, we don’t know what Y thinks because he isn’t here.” Or worse, much much worse, if you just don’t attend a meeting… and then at the next meeting reopen and rehash everything that was painfully sorted out at the previous meeting…. I HATE YOU FOREVER! If you must do that, DECLINE THE MEETING with the comment that you explicitly want decisions on those matters postponed. Cancel that meeting. Not ready yet? Postpone it. No progress since last time? Cancel it! Haven’t got the appropriate people in the room to decide… CANCEL IT! I hate meetings. I think I mentioned that. So if I screw up my courage… drag my ass to the torture chamber with a feeling of dread… AND YOU DON’T EVEN SHOW UP!!!! Yup… I will be angry. Just cancel it. I’ll heave a sigh of relief and feel my day has taken a leap for the better. Do your preparation. If the meeting is to decide on a document… you’ll earn my deep gut hate if your opening remarks are “I didn’t get time to read it all but….” For pity sake, POSTPONE the meeting, DECLINE the meeting (with comment), SHUT UP or DO YOUR JOB and read it. I really don’t mind which of those alternatives you take. I even like it when people postpone a meeting saying, “This document is too broad in scope for me to decide on. Give me an executive summary on the following aspects that concern me, and I will restrict my reading AND remarks to those aspects.” But “I didn’t get time to read it all but….” will cause my blood pressure to fly through the roof! (Oh… and don’t think just not using that phrase will save you… If it is graphically clear you haven’t read it and are jabbering on about things that were clearly handled in that part of the document you haven’t read…. Grrr!) Be on time. Being late for a meeting is just plain rude. Whatever you think may be happening, the messages it sends to other attendees is absolutely clear… 1. You don’t value this meeting. 2. You don’t value the work that the other attendees could have been doing. 3. You don’t value the companies time. (You’re 15 minutes late for a meeting with 10 people? That’s 150 minutes wasted!) Yes, as a Big Boss Man you may be entitled to be late and make others wait. Your time is very valuable. I’m comfortable with that. But just be aware that your actions SHOUT louder than any words you may utter. If there is an endemic culture of being late for meetings at your company… Why? Because the previous meeting was running late. Why was it running late? For all the reasons listed in this document! A smart phone with google calendar integration is a huge aid. Get one. The obvious corollary to this is Finish on time! Keep it short or have regular breaks. A meeting longer than 60 minutes is just plain cruel. If you must, remember “coffee (and/or bathroom) breaks”. I’m utterly convinced the primary attribute of management is they have concrete buttocks and huge bladders. Bring Food! Every culture I have read of has used eating together as a way of removing tension and social bonding. It works. Coping with the “Silent Ones”. The obvious solution of “encouraging them to speak” is a Bad Idea. Why? They are not speaking because… 1. it is painful to them and 2. they know anything they say, will, Heaven Forfend, set the noisy ones off repeating themselves YET AGAIN. The correct solution is for the noisy ones to SAY LESS, to give the Silent Ones an opportunity to speak if they wish. Be encouraging and accepting of what the Silent Ones do say, if they say. Make it clear it is acceptable for them to decline (with comment) the meeting if they feel they have nothing to contribute. Never say anything more than twice. If it didn’t convince them the last two times, it is not going change them on the third, fourth or fifth pass either. They have something stuck in their heads that will not let this idea in. Your choices are limited to finding out what is stuck in their heads and addressing that directly which is a very very difficult error prone task with a high failure rate or… (Conversely you may have something stuck in your head…. not repeating it will allow it to be dislodged.) Speak up, propose a concrete action, shut up, do it. Do you Hate meetings? Do you sit there is silent dread going shutup shutup shutup shutup…. knowing you have seen the issue under discussion come up meeting after meeting after meeting. Try this… when the meeting seems to be dithering endlessly not seeing the obvious… Speak up, propose a single concrete action. Shut up. The mistake is trying to force anyone to see your point or take your action. That causes meetings to explode in length. Rather say your piece, then be quiet again…. Let the meeting end (even if it comes to entirely the wrong conclusion). Experience shows nine times out of ten, the same issue will be discussed in the next meeting. Why is this? Because despite everyone having a very strong opinion about it… odds on nobody has DONE anything about it. Opinions and actions a extremely different classes of entities. If you really disagree with the conclusion of a meeting, having made your suggestion, heard the opposing arguments, and defended your suggestion…. Stop arguing and just do. And when the issue comes up at the next meeting, merely mention you have done what you said should be done. This will set off a furious and very stressful bout of complaints and discussion and they will decide to do something else. No problem. Let them… you know they aren’t going to get around to doing anything anyway. And since you solved the problem, it will fade off the agenda of the next meeting. Take Action Minutes. This is a Grreat Tip. Hate meetings? Take the minutes. Odds on the Happy Shinies are here for another talkfest and are content if nothing comes out of it. So take the minutes. Use a laptop and type them into gmail as the meetings goes, hit send to all attendees at the end of the meeting. When they have almost decided something.. give them a shove to nail it down. Say something like “So that will be Action: Joe Bloggs to do XXX.” I promise you, this is the Best Meeting Survival tip ever. Neither The Time, The Place, The Attendee List nor Your Presence is Sacred. Often our notion of “A Meeting” is set by our experience of a Church. And we have a vague uncomfortable feeling that something sacred is going on which we shouldn’t disrupt. It is not sacred. The meeting time isn’t a sabbath. The Meeting Place isn’t a consecrated church…. if the sun is shining and the air is fresh and pleasant… get up and have the meeting outside. The attendees list isn’t blessed. You really won’t offend any god if you get up and stroll over to the desk of the guy who actually knows the answer… and ask him just to pop in for a few minutes. This works. Trust me. Get up and leave. When you have said all that needs to be said, heard everything everyone else wishes to say, (twice!). And they are beating around the same bush for the third time… Press “Send” on those action minutes, Quietly pack up your things… (sometimes that’s enough), and then say, “Sorry, I have to go now… I have sent out the minutes. Thanks.” And on the more humourous side…. At the intersection of Cancel / Walkout / Meet outdoors / Bring Food…. [http://www.kisabird.com/wp-content/uploads/2010/05/kisabird_051910_durian.jpg](These Durian flavoured (and scented) wafers) will achieve all objectives simultaneously! #### OSI-approved Fair license ### Planet Theory #### TR15-187 | A Note on Perfect Correctness by Derandomization | Nir Bitansky, Vinod Vaikuntanathan In this note, we show how to transform a large class of erroneous cryptographic schemes into perfectly correct ones. The transformation works for schemes that are correct on every input with probability noticeably larger than half, and are secure under parallel repetition. We assume the existence of one-way functions and of functions with deterministic (uniform) time complexity$2^{O(n)}$and non-deterministic circuit complexity$2^{\Omega(n)}$. The transformation complements previous results showing that public-key encryption and indistinguishability obfuscation that err on a noticeable fraction of inputs can be turned into ones that are often correct {\em for all inputs}. The technique relies on the idea of reverse randomization'' [Naor, Crypto 1989] and on Nisan-Wigderson style derandomization, which was previously used in cryptography to obtain non-interactive witness-indistinguishable proofs and commitment schemes [Barak, Ong and Vadhan, Crypto 2003]. #### TR15-186 | On the Relationship between Statistical Zero-Knowledge and Statistical Randomized Encodings | Benny Applebaum, Pavel Raykov \emph{Statistical Zero-knowledge proofs} (Goldwasser, Micali and Rackoff, SICOMP 1989) allow a computationally-unbounded server to convince a computationally-limited client that an input$x$is in a language$\Pi$without revealing any additional information about$x$that the client cannot compute by herself. \emph{Randomized encoding} (RE) of functions (Ishai and Kushilevitz, FOCS 2000) allows a computationally-limited client to publish a single (randomized) message,$E(x)$, from which the server learns whether$x$is in$\Pi$and nothing else. It is known that SRE, the class of problems that admit statistically private randomized encoding with polynomial-time client and computationally-unbounded server, is contained in the class SZK of problems that have statistical zero-knowledge proof. However, the exact relation between these two classes, and, in particular, the possibility of equivalence was left as an open problem. In this paper, we explore the relationship between SRE and SZK, and derive the following results: (1) In a non-uniform setting, statistical randomized encoding with one-side privacy (1RE) is equivalent to non-interactive statistical zero-knowledge (NISZK). These variants were studied in the past as natural relaxation/strengthening of the original notions. Our theorem shows that proving SRE=SZK is equivalent to showing that 1RE=SRE and SZK=NISZK. The latter is a well-known open problem (Goldreich, Sahai, Vadhan, CRYPTO 1999). (2) If SRE is non-trivial (not in BPP), then infinitely-often one-way functions exist. The analog hypothesis for SZK yields only \emph{auxiliary-input} one-way functions (Ostrovsky, Structure in Complexity Theory, 1991), which is believed to be a significantly weaker implication. (3) If there exists an average-case hard language with \emph{perfect randomized encoding}, then collision-resistance hash functions (CRH) exist. Again, a similar assumption for SZK implies only constant-round statistically-hiding commitments, a primitive which seems weaker than CRH. We believe that our results sharpen the relationship between SRE and SZK and illuminates the core differences between these two classes. ### CompsciOverflow #### One-shot Private Randomness Extractor Suppose a pair of random variables$(X,Y)\in\mathcal{X}\times \mathcal{Y}$with joint distribution$P_{XY}$is given. I am interested in a deterministic mapping$f:\mathcal{Y}\to \{0, 1\}^k,$for some integer$k>0$such that $$f(Y)\perp X, \qquad \text{i.e.}, \quad f(Y)\text{ is independent of } X,$$ and $$H(f(Y))>0.$$ The second condition is just to rule out the degenerate functions$f$. For example if$X\sim\mathsf{Bernoulli}(p)$and$P_{Y|X}$is an erasure channel* with erasure probability$\delta>0$, then$f:\{0,\text{e},1\}\to \{0,1\}$where$\text{e}$is the erasure, defined as $$f(1)=f(0)=1,\qquad f(\text{e})=0,$$ satisfies the above condition. What is a necessary and sufficient condition on$P_{XY}$for the existence of such deterministic map? *By erasure channel, I mean the following conditional distribution: $$P_{Y|X}(y|x)=(1-\delta)1_{\{y=x\}},$$ and $$P_{Y|X}(\text{e}|x)=\delta, ~~~~\text{for}~~~~x=0,1.$$ ### Lobsters #### Loops in Lisp Part 1: Goto ### QuantOverflow #### Pricing options under a specific framework I have a specific framework in mind and I would like to value options under this framework. I am not sure whether a closed form solution exists or Monte Carlo methods would work. The framework I have in mind is the one from Lettau and Wachter 2007 (paper here). In summary this is the framework: Let$\epsilon_{t+1}$denote a 3 × 1 vector of independent standard normal shocks that are independent of variables observed at time t. Let$D_t$denote the aggregate dividend in the economy at time t, and$d_t = ln D_t$. The aggregate dividend is assumed to evolve according to:$\Delta d_{t+1} = g + z_t + \sigma_d \epsilon_{t+1}$. where:$z_{t+1} = \phi_z +\sigma_z \epsilon_{t+1}$Also assume that the stochastic discount factor is driven by a single state variable$x_t$where:$x_{t+1} = (1-\phi_x) \bar{x} + \phi_x x_t + \sigma_x \epsilon_{t+1}$.$\sigma_d, \sigma_x, \sigma_z$are all 1x3 vectors. The stochastic discount factor is exogenously defined as:$M_{t+1} = exp ( -r^f - \frac{1}{2} x_t^2 - x_t \epsilon_{d,t+1})$where: where:$\epsilon_{d,t+1} = \frac{sigma_d}{\lVert \sigma_d \rVert} \epsilon_{t+1}$It is quite straight forward to show that the price-dividend ratio of the equity claim the sum of all the claims to future dividends:$\frac{P_t^m}{D_t} = \sum_{n=1}^\infty \frac{P_{nt}}{D_t} = \sum_{n=1}^\infty exp(A(n) + B_x(n) x_t + B_z(n) z_t)$where$A,B_x,B_z$are solved in closed form solution. Now what I am looking is a method to calculate the value of a call option, with strike$K$and maturity$\tau$under this framework, meaning:$C(t,\tau,K) = E_t[M_{t,\tau}max(P^m_t-K,0)] $Not sure whether a closed form solution exists... if not would Monte Carlo, work? ### StackOverflow #### Writing a curried javascript function that can be called an arbitrary number of times that returns a value on the very last function call I'm currently working on a programming problem in my personal time that asks that I make a javascript function that can be called in this manner. add(1) // 1 add(1)(2) // 3 add(1)(2)(3); // 6 add(1)(2)(3)(4); // 10 add(1)(2)(3)(4)(5); // 15  What I'm having trouble figuring out is how to make it return a value on the very last call. For example, in order for add(1)(2) to work, then add(1) has to return a function, but according to the instructions add(1) when called by itself will return 1. I'm assuming one way you can overcome this is to figure out how many times in succession the add function is being called, but I cannot think of a way to achieve that. Does anyone have any hints that can point me in the right direction? I've read these two articles (1, 2) on function currying and I understand them, but I'm not sure how to do currying when dealing with a variable number of arguments. #### Using language models for term weighting I understand that scikit supports n-grams using a Vectorizer. But those are only strings. I would like to use a statistical language model (https://en.wikipedia.org/wiki/Language_model) like this one: http://www.nltk.org/_modules/nltk/model/ngram.html. So, what I want is a Vectorizer using the probability as term weight instead of let's say tf-idf or simply a token count. Is there a reason why this is not supported by scikit? I'm relatively inexperienced with language modeling, so I'm not sure if this approach is a good idea for text classification. ### CompsciOverflow #### Data structures for ordering noisy data In a certain robotics application, I encountered a problem in which we need to determine the order of positions of several robots on$\mathbb{R}$. Each measurement that we take of robot positions is subject to random errors beyond our control. I've formalised the algorithmic part of the problem as follows. This part has been reformulated as per D.W.'s comments. The positions of$n$robots are given to us in a$3\times n$array, each row being of the form$[i, A, \epsilon]$.This row states that the position of the robot with id$i$is uniformly distributed on the interval$A \pm \epsilon$. Problem : Output the most probable ordering (MPO) of the robot positions, or one such if this is non-unique. Example Input:$\begin{bmatrix} \mathrm{id} & A & \epsilon \\ \hline 1 & 1& 0.5 \\ 2& 1.5 & 2 \end{bmatrix}$. Associate the two robot positions with random variables$X_1$and$X_2$. Note that the pair$(X_1,X_2)$is uniform on the rectangle$[0.5,1.5] \times [-0.5,3.5]$. The probability$p_1$(resp.$p_2$) that$X_1\leq X_2$(resp.$X_1 \geq X_2$) is the area of this rectangle below (resp. above) the line$x_1\leq x_2$. If$p_1 \leq p_2$, then the MPO is$[1,2]$, else$[2,1]$. What data structure efficiently computes the MPO? I'm most interested in the range$n\in [10,1000]$. Simple idea that doesn't work (thanks to D.W.): When$n\geq 3$, the MPO isn't equal to the sorted order of the means$A_i$. For example, the input$(1, 0.4\pm0), (2,0.5\pm 0.5), (3,0.7 \pm 0)$has the MPO$[2,1,3]$and not$[1,2,3]$as the$A$'s would have us think. One possible approach uses the Sweep Line algorithm. Think of a robot position as a rectangle corresponding to its uniform pdf. Compute all intersecting rectangles using the Priority Queue of the sweep line, and reorder them as per the probability calculation above. (It may be possible to reduce the$O(n^2)$pairwise comparison of probabilities to$O(n \log n)$; I haven't checked this though.) Non-overlapping rectangles leave their ordering intact. Intuitively, large error bounds lead to long rectangles and potentially more overlaps, consequently increasing the running time. Likewise, small error bounds cause few overlaps, speeding up computations. This fact will be reflected in the sweep line. ### Lobsters #### What can a technologist do about climate change? A personal view. #### I am done with bad meetings #### Frameworks ### StackOverflow #### Autos, Arrows and Free Monads I've read two great series of articles about Autos and Free Monads and I'd like to combine this two techniques somehow. I'd like to have something like: data ProgramF a = Get (String -> a) | Set String a instance Functor ProgramF where ... type Program = Free ProgramF get' :: Program String get' = liftF$ Get id

set' :: String -> Program ()
set' s = liftF $Set s () auto1 :: AutoM Program () String auto1 = arrM \_ -> get' auto2 :: AutoM Program String () auto2 = arrM \s -> set' s auto3 :: AutoM Program () () auto3 = auto1 >>> auto2 ...  But there are some problems, for example ArrowLoop requires Program to be an instance of MonadFix which is not possible as far as I understand. So my questions are: • Are there ways to make Auto and Free work together? • And if not, maybe there are other ways to achieve the goal? Note: I'm pretty new to functional programming, so I understand from little to nothing in theory. Update: In one of the comments it was mentioned that Auto is itself a form of fixpoint and I can use ProgramF directly with it. So I guess that the type of Auto should look something like this: newtype Auto f a b = Auto (a -> f (b, (Auto f a b)))  But the problem now is that I can't figure out how to compose two Autos without f being a Monad. My end goal is to have some composable pieces of code with internal state and a way to purify my code hiding all IO effects (like log or getLine) in some kind of interpreter. So I guess my real question is: how can I implement something described above? Maybe I'm doing it all wrong and there is a better way. Can someone please give a simple example or provide some links to something similar? ### TheoryOverflow #### Conducting an empirical research regarding UML [on hold] First of all, I'm sorry if i missed the correct site for asking this question. If so, please, redirect me to the correct one. OK, long story short, I came up with a concept that could make some UML diagrams easier to understand. I've implemented the concept in a prototype tool and now i want to validate that my concept actually increases understandability of UML diagrams. So I'm planning a empirical research. To this end I've modeled a complex and a simple model. Ideally i want two groups of participants: one that would answer the questions with the usage of my concept and the other that would answer the same questions with 'native' UML, without the usage of my concept. I plan on measuring the number of correct answers and the time needed to answer each question. Here's where I'm stuck: what kind of questions should i ask: a true/false questions or questions where participants would have to check all the possible answers? In the existing literature there are only true/false questions but i think that is not as representative as actually selecting the correct parts of the question. So, are there any guidelines on how to conduct such an experiment? How to form the questions that would measure understandability? Any kind of advice is greatly appreciated! ### Lobsters #### LittleD: A relational database for embedded devices and sensors nodes ### CompsciOverflow #### P time complexities theory I am having problem understanding reductions. I understand that if language$L_1$can be reduced to language$L_2$in linear time and$L_2\in\mathrm{P}$, this implies that$L_1\in\mathrm{P}$. But if we know$L_2$has a time complexity of, let's say,$\Theta(n \log n)$, can we say that$L_1$runs in$O(n \log n)$? Since the reduction from$L_1$to$L_2$is in linear time and$L_2$runs in$\Theta(n \log n)$and so it will be$O(n) + \Theta(n \log n)$. And also let's say$L_2$can be also linearly reduced to$L_3$, can we say$L_3$runs in$\Omega(n \log n)$? ### Lobsters #### Recipe: The Best Darn HTTP Cookies ### QuantOverflow #### Is it too important that my residuals be normal? I am Using an ARMA/GARCH model I am trying to fit an ARMA/GARCH model to a time series. I found that the best candidate is an ARMA(1,0) + GARCH(1,1) with gaussian white noise It has coefficients with p-values near cero and the residuals are white noise. The problem is that the Jarque Bera Test says the residuals are not normal. The QQ normal plot confirm that. And when I try with several ARMA/GARCH models with t-student white noise, for example: the QQ t-student plot fits very well (except for some outliers), but the rest don't (I mean not as good as the first one). Which one is better? I have been stuck in this problem for a while. Thank you very much Rodrigo ### StackOverflow #### TensorFlow - why doesn't this sofmax regression learn anything? I am aiming to do big things with TensorFlow, but I'm trying to start small. I have small greyscale squares (with a little noise) and I want to classify them according to their colour (e.g. 3 categories: black, grey, white). I wrote a little Python class to generate squares, and 1-hot vectors, and modified their basic MNIST example to feed them in. But it won't learn anything - e.g. for 3 categories it always guesses ≈33% correct. import tensorflow as tf import generate_data.generate_greyscale data_generator = generate_data.generate_greyscale.GenerateGreyScale(28, 28, 3, 0.05) ds = data_generator.generate_data(10000) ds_validation = data_generator.generate_data(500) xs = ds[0] ys = ds[1] num_categories = data_generator.num_categories x = tf.placeholder("float", [None, 28*28]) W = tf.Variable(tf.zeros([28*28, num_categories])) b = tf.Variable(tf.zeros([num_categories])) y = tf.nn.softmax(tf.matmul(x,W) + b) y_ = tf.placeholder("float", [None,num_categories]) cross_entropy = -tf.reduce_sum(y_*tf.log(y)) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) # let batch_size = 100 --> therefore there are 100 batches of training data xs = xs.reshape(100, 100, 28*28) # reshape into 100 minibatches of size 100 ys = ys.reshape((100, 100, num_categories)) # reshape into 100 minibatches of size 100 for i in range(100): batch_xs = xs[i] batch_ys = ys[i] sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) xs_validation = ds_validation[0] ys_validation = ds_validation[1] print sess.run(accuracy, feed_dict={x: xs_validation, y_: ys_validation})  My data generator looks like this: import numpy as np import random class GenerateGreyScale(): def __init__(self, num_rows, num_cols, num_categories, noise): self.num_rows = num_rows self.num_cols = num_cols self.num_categories = num_categories # set a level of noisiness for the data self.noise = noise def generate_label(self): lab = np.zeros(self.num_categories) lab[random.randint(0, self.num_categories-1)] = 1 return lab def generate_datum(self, lab): i = np.where(lab==1)[0][0] frac = float(1)/(self.num_categories-1) * i arr = np.random.uniform(max(0, frac-self.noise), min(1, frac+self.noise), self.num_rows*self.num_cols) return arr def generate_data(self, num): data_arr = np.zeros((num, self.num_rows*self.num_cols)) label_arr = np.zeros((num, self.num_categories)) for i in range(0, num): label = self.generate_label() datum = self.generate_datum(label) data_arr[i] = datum label_arr[i] = label #data_arr = data_arr.astype(np.float32) #label_arr = label_arr.astype(np.float32) return data_arr, label_arr  ### CompsciOverflow #### Which Java should I learn for Android app building? [on hold] There seems to be different types of Java. Java script is for web based apps. Now that I want to learn to build Android app from basics, which Java should I learn? Java swing? But I can't find any place where it is clearly given that I should learn Java swing, which is why I am asking the question. ### QuantOverflow #### Can the money market break in a crisis situation? I am researching the question, what happens during a crisis situation to the money market. Can it break? In 2007-08 there is evidence that liquity hoarding from banks became a rather common problem. (See f.ex. Berrospide 2012) I in a risk statement(which I cannot find anymore) that the ECB does not see the money market as a secure way to fund a bank in a crisis situation and somehow I interprete from this that it only sees itself as the real constant during a financial crisis. Does the money market hold in a severe crisis situation? I appreciate your experienced answers! ### Planet Theory #### Research instructorship at Princeton and IAS (apply by Dec 15, 2015) The newly created 3-year positions for outstanding young theorists. Combining research with teaching duties, these positions come with attractive benefits and working conditions. Typically, the 1st and 3rd years are spent at Princeton University and the 2nd year is spent at the IAS. These arrangements are flexible. The application process for this position is separate from Princeton postdoc applic Email: mbraverm@cs.princeton.edu ### StackOverflow #### OpenCV image processing with SVM regression Previously, my data are all stored in MySQL, which could be easily be downloaded as a CSV. The syntax of my CSV file is as below: ImageID, ImageOnlineLink, ImageDescription1, ImageDescription2, Score The image could be assessed through the 2nd attribute (OnlineLink), while regression will be performed based on the Score, using SVM. The plan was loading the image into the system (about a few hundred), conduct a number of image processing, where the results from the image processing would be used as additional attributes for the regression, and finally perform the training. In other words, the final syntax of the CSV file should be something like this: ImageID, ImageOnlineLink, ImageDescription1, ImageDescription2 ... ImageDescriptionN (Obtained through processing), Score The documentation are not really helpful and would need some suggestion to perform the above. #### Final year programming project I'm in my final year of a Degree in Software Engineering and I am about to begin my final year project. I am going to do a website that is responsive to mobiles/tablets etc. I am doing my project on a Student Money Tracker and I would appreciate any sort of help/ideas on how I can add more complexity to it/do something that isn't already out there. I've done a number of survey's but nothing so far has been overly useful. The project is based on the student adding in all of their in-comings/outgoings for a month and the app calculates this and tells them how much is available for them to spend each day until their next pay-day/ student loan etc. I am really struggling to think of a way to make this unique. I was thinking along the lines of trying to incorporate some machine learning, prediction but I am really struggling to get an exact idea of what to do! I truly appreciate any form of help/advice and thank you all in advance!! ### Lobsters #### Microsoft's Software Is Malware ### High Scalability #### Sponsored Post: StatusPage.io, iStreamPlanet, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7 ## Who's Hiring? • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably. • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here. • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here. • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply. • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here. • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here. ## Fun and Informative Events • Your event could be here. How cool is that? ## Cool Products and Services • Real-time correlation across your logs, metrics and events. Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale. • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative. • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial! • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers. • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/ • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications. • www.site24x7.com : Monitor End User Experience from a global monitoring network. If any of these items interest you there's a full description of each sponsor below. Please click to read more... ### TheoryOverflow #### Prove that a DFA over binary alphabet with k states can recognise a maximum of k^(2k +1) * 2^k languages [on hold] I was given this question in a test and I couldn't do it. After trying on it a little I am still unable to do it. I think I am missing something but not sure what. Can anybody help me? ### Lobsters #### Fair Source License ### StackOverflow #### Using Weka to Classify Author Blog Gender I am trying to use Weka in Java to classify an author's blog as written by a male or female. I build a class called Weka which defines the attributes to be used in the training set and then calls a method to load all the already known data from an excel sheet. The data in the file is organized like this: each row has blog text in cell 0 and then an M or an F in cell 1. blog text M more text F I am also following this tutorial a little Weka Java Tutorial When I run the program I start to see text whizz by in the console window in eclipse but suddenly I get a red error that says "Value not defined for given nominal attribute!" I'm not quite sure why this happens. The text is changing from row to row so I thought it was not possible to define all the nominal attributes. Can anyone see what I'm doing wrong or stupid here??? I would greatly appreciate any help. I've been stuck on this for a couple of hours. CODE:  public class Weka { static FastVector fvWekaAttributes; static Instances isTrainingSet; static Classifier cModel; public static void main(String[] args) throws Exception { // Declaring attributes Attribute stringAttribute = new Attribute("text", (FastVector) null); // Declaring a class attribute along with values FastVector fastVClassVal = new FastVector(2); fastVClassVal.addElement("M"); fastVClassVal.addElement("F"); Attribute classAttribute = new Attribute("theClass", fastVClassVal); // Declaring the feature vector fvWekaAttributes = new FastVector(2); fvWekaAttributes.addElement(stringAttribute); fvWekaAttributes.addElement(classAttribute); // create the training set isTrainingSet = new Instances("Rel", fvWekaAttributes, 10); // set class index isTrainingSet.setClassIndex(1); // create however many instances is in my excel file // and add it to the training set in a loop. Weka.LoadExcelWorkBook(isTrainingSet); Weka.TestSetWork(); } public static void TestSetWork() throws Exception { // test the model Evaluation testing = new Evaluation(isTrainingSet); testing.evaluateModel(cModel, isTrainingSet); // printing the results.... String strSummary = testing.toSummaryString(); System.out.println(strSummary); // get confusion matrix. double[][] cmMatrix = testing.confusionMatrix(); for (int i = 0; i < cmMatrix.length; i++) { for (int col = 0; col < cmMatrix.length; col++) { System.out.print(cmMatrix[i][col]); System.out.print("|"); } System.out.println(); } } public static void LoadExcelWorkBook(Instances trainingSet) throws Exception { System.out.println("LOADING EXCEL WORKBOOK!!!"); Workbook wb = null; // opening excel file. try { wb = WorkbookFactory .create(new File("C://blog-gender-dataset.xlsx")); } catch (IOException ieo) { ieo.printStackTrace(); } // opening worksheet. Sheet sheet = wb.getSheetAt(0); StringToWordVector filter = new StringToWordVector(); filter.setInputFormat(isTrainingSet); Instances dataFiltered = Filter.useFilter(isTrainingSet, filter); for (Row row : sheet) { Cell textCell = row.getCell(0); Cell MFCell = row.getCell(1); String blogText = textCell.getStringCellValue(); String MFIndicator = MFCell.getStringCellValue(); System.out.println("TEXT FROM EXCEL " + blogText); Instance iText = new Instance(2); iText.setValue((Attribute) fvWekaAttributes.elementAt(0), tweetText); iText.setValue((Attribute) fvWekaAttributes.elementAt(1), MFIndicator); isTrainingSet.add(iText); cModel = (Classifier) new J48(); cModel.buildClassifier(dataFiltered); } } }  ### Lobsters #### Forgo JS packaging? Not so fast ### QuantOverflow #### GARCH parameters I'm trying to estimate parameters of GARCH(p,q) model. I tried p=1, q=1 with t-distribution errors. Ljung-Box showed no correlation in residuals and squared residual. But the null hypothesis that ARCH-term's coefficient equals 0 was not rejected. So I tried p=0, q=1. Ljung-Box indicated serial correlation in residuals and squared residuals. Moreover, AIC and SC chose the former model. Should I choose GARCH(1,1), though one coefficient is statistically insignificant ? ### Lobsters #### Sourcegraph, a new self-hosted Git service with semantic code navigation, search, and review ### StackOverflow #### Lemmatizer supporting german language (for commercial and research purpose) I am searching for a lemmatization software which: • supports the german language • has a license that allows it to be used for commercial and research purpose. LGPL license would be good. • should preferably be implemented in Java. Implementations in other programming languages would also be OK. Does anybody know about such a lemmatizer? Regards, UPDATE: Hi Daniel, At first, thank you for the great work you are providing with the LanguageTool. We would like to index german Texts into elasticsearch (ES) and pre-analyze the texts using either an ES-built-in german stemmer (please see https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html) or the following plugin https://github.com/jprante/elasticsearch-analysis-baseform. The latter uses your morphology file under http://www.danielnaber.de/morphologie/morphy-mapping-20110717.latin1.gz and that is why I thought you maybe have some evaluation data in order to know what is the trade-off when using the lemmatization based on your morphology file instead of an ES-built-in stemmer. Do you maybe have some figures in terms of precision/ coverage of your german morphology? Or comparative data with the german stemmers used in Elasticsearch? Best regards #### Categorical data transformation in Scikit-Learn I have a 40 million x 22 numpy array of integer data for a classification task. Most of the features are categorical data that use different integer values to represent different categories. For example, in the column "Color": 0 means blue, 1 means red and so on. I have preprocessed the data using LabelEncoder. 1. Does it make sense to fit those data into any classification model in SK-learn? I tried to fit the data into Random Forest model but got extremely bad accuracy. I also tried One Hot Encoding to transform the data into dummy variables, but my computer can only deal with a sparse matrix after using One Hot Encoding, the problem is that Random Forest can only take a dense matrix, which will exceed my computer's memory. 2. What's the correct strategy to deal with categorical data in SK-learn? #### What is Naive Bayes properties: useKernelEstimator and useSupervisedDiscretization I am using Naive Bayes as the learning algorithm in Weka data mining tool. There are parameter options in Naive Bayes as 'useKernelEstimator' and 'useSupervisedDiscretization'. Can someone plese tell me what these two parameters are? :) ### CompsciOverflow #### Approximation Algorithm for Metric TSP [on hold] So, we have undirected complete graph G where all edges have length 1 or 2 thus, satisfy triangle inequality. What will be the 4/3 factor algorithm for this TSP problem? One possible approximation approach could be (but does not hold 4/3 factor): Find a minimum spanning tree. Duplicate edges and create an Eulerian graph. Find an Eulerian tour and convert to TSP. Could you suggest any papers on this? ### QuantOverflow #### Fair Price CDS Spread for a Bank I have been using CreditGrades to calculate fair one year CDS spreads for firms. However, the authors of the model explicitly say that the model does not hold for banks or financial firms. If I need to price a thereotical one year CDS spread for a financial firm, what are some suggestions or starting points for an alternative model for accomplishing this? Could I potentially modify the assumptions of the CreditGrades model to hold for banks? This is the creditgrades technical document: http://www.creditrisk.ru/publications/files_attached/cgtechdoc.pdf ### Lobsters #### The Golden Gigaflop #### Fontdeck is Retiring ### Fefe #### Stell dir vor, du erfährst aus der Tageszeitung, dass ... #### Ziemlich cooler Foliensatz zum Thema Benchmark- und ... ### QuantOverflow #### garchOxFit in R Could someone please help me with trying to get the Ox interface to work in R. I followed the steps outlined in this paper (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1752095), but I get the following errors as output: Ox Console version 6.21 (Windows/U) (C) J.A. Doornik, 1994-2011 This version may be used for academic research and teaching only C:\Ox\lib\GarchOxModelling.ox (28): 'fopen' undeclared identifier C:\Ox\lib\GarchOxModelling.ox (29): 'fscan' undeclared identifier C:\Ox\lib\GarchOxModelling.ox (39): 'fclose' undeclared identifier C:\Ox\lib\GarchOxModelling.ox (227): 'fprint' undeclared identifier Error in file(file, "r") : cannot open the connection In addition: Warning messages: 1: running command 'C:\Ox\bin\oxl.exe C:\Ox\lib\GarchOxModelling.ox' had status 1 2: In file(file, "r") : cannot open file 'OxResiduals.csv': No such file or directory How would I go about solving the undeclared identifier problem? And then the following 2 additional warnings messages 1. and 2. ? I am really out of my depth here but really need to use FIGARCH & possibly FIEGARCH in R. Thanks for any help provided. ### DataTau #### Rodeo 1.1 - Markdown, Autoupdates, Feedback ### QuantOverflow #### garchOxFit in R-oxo file does not match Could someone please help me with trying to get the Ox interface to work in R. I get the following errors as output: This version may be used for academic research and teaching only Link error: 'packages/Garch42/garch' please recompile .oxo file to match this version of Ox Hata oluştu: file(file, "r") : bağlantı açılamadı Ek olarak: Uyarı mesajları: 1: 'C:\Ox\bin\oxl.exe C:\Ox\lib\GarchOxModelling.ox' komutunu çalıştırırken 1 durumu oluştu 2: In file(file, "r") : dosya 'OxResiduals.csv' açılamadı: No such file or directory #### Equivalent Definitions of Self-Financing Portfolio Consider a multi-period model with$t=0,...,T$. Suppose there is a bond with$B_0=1$and$B_t=(1+R)^t$and a stock with$S_0=s_0$and $$S_{t+1}=S_t\,\xi_{t+1},$$ with$\xi_t$iid random variables. I indicate with$(\alpha_t,\beta_t)$the predicable components of the portfolio. I have found two different definitions of self-financing portfolio that I would like to re-conciliate. In the book by Tomas Bjork ("Arbitrage Theory in Continuous Time") it is said that the value of the portfolio at time$t$is $$V_t^{(\alpha,\beta)} = \alpha_t\,S_t+\beta_t\,(1+R)$$ and the self-financing condition is expressed as $$\alpha_t\,S_t+\beta_t\,(1+R) = \alpha_{t+1}\,S_t+\beta_{t+1}.$$ This is quite intuitive form me since, in Bjork,$\alpha_t$(resp.$\beta_t$) is, by definition, the amount of money we invest in the stock (resp. in the bond) at time$t-1$and keep up to time$t$. So if I buy$\beta_t$unit of the bond at time$t-1$then I gain$\beta_t\,(1+R)$at time$t$. Nevertheless, in the book by Andrea Pascucci ("PDE and Martingale Methods in Option Pricing") it is said that the value of the portfolio is $$V_t^{(\alpha,\beta)} = \alpha_t\,S_t+\beta_t\,B_t = \alpha_t\,S_t+\beta_t\,(1+R)^t$$ and the self-financing condition is expressed as $$V_t^{(\alpha,\beta)} =\alpha_t\,S_t+\beta_t\,(1+R)^t = \alpha_{t+1}\,S_t+\beta_{t+1}\,(1+R)^t.$$ Pascucci define$\alpha_t$(resp.$\beta_t$) as the amount of the asset$S$(resp. of the bond$B$) held in the portfolio during the period$[t-1,t]$. Are the two definitions equivalent? I am pretty sure that the the solution is in the fact that in Bjork it is defined as the amount invested while in Pascucci as the amount held. Nevertheless I miss which kind of relationship is in between the two. ### CompsciOverflow #### Modified counting sort algorithm? So I have an array$A$, already sorted with CountingSort. Now I reduce one randomly chosen element$j$with$A[j]>0$by$x \in \{1 \dots A[j]\}$. I still have the counting array$C$, since I have sorted the array with counting sort. The question is: how can I now sort the modified array in$O(k)$? So far: Obviously I won't need to check all the elements of$A$again, otherwise it would be$O(n)$. I have to modify the$C$array somehow. If you subtract$C[j+1] - C[j]$you have the number of elements in the respective interval (like$0 - 0=0$are 0 Elements of value 0,$1-0 = 1$is 1 Element of value 1). But this is still only for the old array. How can I understand which element from$A$is reduced only by looking at$C$? Example: Sorted$A[4] = \{1,3,4,4\}$, reduce lets say$j = 2$with$A[2] = 4$by$x \in \{1, \dots , 4\}$. Let$x = 4$then we have$A[4] = \{1,3,0,4\}$. How can I sort it now in$O(k)$like$A[4] = \{0,1,3,4\}$? ### Planet Theory #### TR15-185 | Lower bounds for constant query affine-invariant LCCs and LTCs | Sivakanth Gopi, Arnab Bhattacharyya Affine-invariant codes are codes whose coordinates form a vector space over a finite field and which are invariant under affine transformations of the coordinate space. They form a natural, well-studied class of codes; they include popular codes such as Reed-Muller and Reed-Solomon. A particularly appealing feature of affine-invariant codes is that they seem well-suited to admit local correctors and testers. In this work, we give lower bounds on the length of locally correctable and locally testable affine-invariant codes with constant query complexity. We show that if a code$\mathcal{C} \subset \Sigma^{\mathbb{K}^n}$is an$r$-query locally correctable code (LCC), where$\mathbb{K}$is a finite field and$\Sigma$is a finite alphabet, then the number of codewords in$\mathcal{C}$is at most$\exp(O_{\mathbb{K}, r, |\Sigma|}(n^{r-1}))$. Also, we show that if$\mathcal{C} \subset \Sigma^{\mathbb{K}^n}$is an$r$-query locally testable code (LTC), then the number of codewords in$\mathcal{C}$is at most$\exp(O_{\mathbb{K}, r, |\Sigma|}(n^{r-2}))$. The dependence on$n$in these bounds is tight for constant-query LCCs/LTCs, since Guo, Kopparty and Sudan (ITCS 13) construct affine-invariant codes via lifting that have the same asymptotic tradeoffs. Note that our result holds for non-linear codes, whereas previously, Ben-Sasson and Sudan (RANDOM 11) assumed linearity to derive similar results. Our analysis uses higher-order Fourier analysis. In particular, we show that the codewords corresponding to an affine-invariant LCC/LTC must be far from each other with respect to Gowers norm of an appropriate order. This then allows us to bound the number of codewords, using known decomposition theorems which approximate any bounded function in terms of a finite number of low-degree non-classical polynomials, upto a small error in the Gowers norm. ### StackOverflow #### F# XML parsing this c# code is probably not the most efficient but gets what I want done. How do I accomplish the same thing in F# code?  string xml = " <EmailList> " + " <Email>test@email.com</Email> " + " <Email>test2@email.com</Email> " + " </EmailList> "; XmlDocument xdoc = new XmlDocument(); XmlNodeList nodeList; String emailList = string.Empty; xdoc.LoadXml(xml); nodeList = xdoc.SelectNodes("//EmailList"); foreach (XmlNode item in nodeList) { foreach (XmlNode email in item) { emailList += email.InnerText.ToString() + Environment.NewLine ; } }  ### TheoryOverflow #### Can the set cover problem be reduced to the metric dimension problem? In the introduction of metric dimension, the set cover problem and the metric dimension problem are very similar. And we can reduce the metric dimension problem to the set cover problem. However, in all the proofs of the NP-hardness of the metric dimension problem, they all reduce the 3-SAT problem to the metric dimension problem (see here for a reduction). I am wondering if we can reduce the set cover problem to the metric dimension problem. It seems very hard, because usually if one direction is easy to prove, the other direction is very hard. I have no idea how to transform an instance of the set cover problem to a graph such that its metric dimension just equals the least number of sets needed in the set cover problem. Is such question meaningless, or it is just impossible to prove that? Edit: As in the comments, I want to know what the explicit reduction looks like, i.e., how we reduce an instance of the set cover problem to an instance of the metric dimension problem and how the black box of the dimension problem solves the set cover problem. ### StackOverflow #### What are the centroid of k-means clusters with PCA decomposition? From a dataset in which I am using PCA and kmeans, I would like to know what are the central objects in each cluster. What is the best way to describe these objects as iris from my original dataset ? from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target from sklearn.decomposition import PCA pca = PCA(n_components=2, whiten=True).fit(X) X_pca = pca.transform(X) from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=3).fit(X_pca) # I can get the central object from the reduced data but this does not help me describe # the properties of the center of each cluster from sklearn.metrics import pairwise_distances_argmin_min closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, X_pca) for i in closest: print X_pca[i]  ### QuantOverflow #### What risks is an exchange exposed to? Putting aside operational/reputational/business risks for a minute, a financial institution is concerned with the risk of losing money on their positions. What about an exchange ? I can only think of the case where a multiple counterparties with huge exposures default simultaneously or close to each other totaling more than the clearing members can cover. What other risks (other than operational/reputational) might they face? ### Lobsters #### JavaScript dates, trains, Passover, and Henry VIII ### Fefe #### Eine der Strategien der Amerikaner beim Drohnenmord ... Eine der Strategien der Amerikaner beim Drohnenmord ist ja, nach der Ermordung der ersten Opfer dann auch noch gezielt die Ersthelfer abzuschlachten. Wir erinnern uns: Der Name dafür ist Double Tap. Daran musste ich gerade denken, als ich las, dass den Russen nach dem Abschuss des Kampfflugzeugs auch der Rettungshubschrauber runtergeholt wurde. Nun ist natürlich völlig unklar, erstens ob der wirklich runtergeholt wurde, zweitens von wem, und drittens ob das von den selben war, die auch das Flugzeug runtergeholt hatten. Aber es fühlt sich schon ein bisschen wie der Anfang des 3. Weltkriegs an. Russian President Vladimir Putin has called an emergency meeting over the downing of the jet and unverified sources have claimed Russia is currently sending a warship across the Dardenelles from the Black Sea into the Mediterranean. #### Zu der Nummer mit dem hochfrequenten Ton als Werbetracker ... #### Old and busted: Riesiger Konzern kauft Startup.New ... Old and busted: Riesiger Konzern kauft Startup. New hotness: Winzige irische Pharmafirma kauft Pfizer für 160 Milliarden. Das ist eine Aktion von Pfizer, um am Ende als irische Firma zu enden und Steuern in den USA zu sparen. Update: Das "winzig" ist im Vergleich zu Pfizer gemeint. #### Die Türkei hat wohl ein russisches Kampfflugzeug abgeschossen. ... Die Türkei hat wohl ein russisches Kampfflugzeug abgeschossen. Die Türkei sagt, es habe ihre Grenzen verletzt. Die Russen bestätigen den Abschuss aber bestreiten die Grenzverletzung. Wenig überraschend ist Putin recht wütend, aber auch die andere Seite kriselt: die Türkei bestellt den russischen Botschafter ein und die Nato beruft eine Sondersitzung ein. #### Steht ihr auch so auf emotionale Reden? Dann empfehle ... Steht ihr auch so auf emotionale Reden? Dann empfehle ich diese Bernie-Sanders-Wahlkampfeinführung. Das sind fünf Minuten, in denen ein Rapper in Atlanta Sanders begrüßt. ### TheoryOverflow #### Confusions about the technique for verifying implementations of linearizable objects in [Herlihy and Wing, 1990] In Section 4.3.2 entitled "Proof Method" of the authors describe the technique for verifying implementations of linearizable objects as follows: Assume that the implementation of$r$is correct, hence$\text{H} \mid {\small \text{REP}}$is linearizable for all$\text{H}$in the implementation. Our verification technique focuses on showing the following property (denoted$\mathcal{P}$for later reference, by myself): $$\text{For all } r \in \text{Lin}(\text{H}\mid{\small\text{REP}}), \text{I}(r) \text{ holds and } \text{A}(r) \subseteq \text{Lin}(\text{H}\mid{\small\text{ABS}}).$$ This condition implies that$\text{Lin}(\text{H} \mid {\small \text{ABS}})$is nonempty, hence that$\text{H} \mid {\small \text{ABS}}$is linearizable. I don't understand: Why is it sufficient to show that the "subset"$\text{A}(r)$property (besides the "invariant"$\text{I}(r)$property) holds for the verification purpose? Specifically, to show the property$\mathcal{P}$, are we showing that there exists an$\text{I}(r)$and an$\text{A}(r)$such that$\mathcal{P}$holds? If so, why cannot we just set$\text{I}(r) = \top$and$\text{A}(I) = \emptyset$which trivially satisfy$\mathcal{P}$? If not, what is the formal definition of "verification" here? ### About examples in this paper: • Section 4.3.3: in the queue example,$\text{I}(r)$and$\text{A}(I)$are far from trivial. What are the specifications that prevent us from choosing$\text{A}(I) = \emptyset$in this example? • Section 4.2: to motivate their definitions of$\text{I}(r)$and$\text{A}(r)$, the authors also use the queue example. At the top of page 477, it reads: Let$r$be the rep value after this history. Because$\beta$'s Enq operation has returned,$\text{A}(r)$must reflect$\beta$'s Enq. Because$\alpha$'s Enq operation is still in progress,$\text{A}(r)$may or may not reflect$\alpha$'s Enq, depending on how$\text{A}$is defined. Note that I have used$\alpha, \beta$instead of$A, B$to denote the processes, to avoid symbol abuses. According to this argument, I suppose that there are some constraints on the definition of$\text{A}$. But what are they? ### StackOverflow #### Useful code which uses reduce() in python Does anyone here have any useful code which uses reduce() function in python? Is there any code other than the usual + and * that we see in the examples? Refer Fate of reduce() in Python 3000 by GvR ### CompsciOverflow #### Virtual memory smaller than main memory. Is it possible? [on hold] Suggest a situation when it would be advantageous to define a virtual memory smaller than the main memory. Similarly suggest a situation where use of cache memory will be detrimental. #### Don't understand the merge part of the mergesort I understand how the divide part of the algorithm works and how it is meant to spread efforts. What I don't understand is how would you merge blocks [7][14] and [3][12] or [9][11] and [2][6]. For the latter part it's easy enough as you just concatenate them ( although how do you know that? ) but for [7][14] and [3][12] you have to rearrange their indexes for the order to be increasing. How in software/pseudocode do you implement that step? ### QuantOverflow #### Value-at-Risk formula when using skewed-t distribution I am trying to find a formula for the skewed-t VaR. For example the VaR formula for a t-distribution is $$\sqrt{\frac{df-2}{df}} \times \Sigma{t} \times \mbox{quantitle}(t-\mbox{dist}, 0.01) + \mu$$ (Please excuse the messy formula & the sigma(t) denotes a GARCH model) However I am struggling to do the same for a skewed-t distribution I am using the rugarch package in R and I am struggling to find out which version of the skewed-t distribution is being used. I went to the fGarch pdf and downloaded the reference ON BAYESIAN MODELLING OF FAT TAILS AND SKEWNESS by C. Fernandez et al., but my lack of Bayesian knowledge means the pdf it says is the Skew-Student is not helping perhaps as much as it should. Any help would be much appreciated. ### UnixOverflow #### How to mount /boot from LiveCD (FreeBSD 11 ZFS) FreeBSD 11 (current) with ZFS I can mount zroot with zpool import -fR /mnt zroot but /mnt/boot is empty (and it's even not a directory) I need to edit loader.conf, how can I do it? ### StackOverflow #### I'll purchase a server or workstation for my lab to run neural networks models, is there any advices? Our lab studies natural language processing. We need to run neural networks to to process our tasks. I know nothing about how to choose a server(or workstation). If you can give me some advice on hardware configuration or recommand a specific type, please help me. Thanks a lot! ### Fefe #### Die Schweiz denkt laut darüber nach, Threema zum Einbau ... ### CompsciOverflow #### Shortest path visiting all nodes in an unrooted tree I have a tree. This tree has no particular root (free-tree). I want to find a path that visit all nodes. This path has to be the shortest possible. All edges are considered to have a distance of 1. My problem is not finding a Hamiltonian path, because I don't have any restriction on the number of times a node can be visited. We don't know the source node of the path. The source has to be one allowing us to find the shortest path visiting all nodes. ### StackOverflow #### set attribute as class WEKA I am trying to classy some texts. What is the difference between those below ways to set attribute as class? Using both of these ways, I received different results: 1. edit-(right click)choose 'set attribute as class' - On preprocess tab enter image description here 2. choose '(Nom) @@class@@' from a drop-downc list - On classify tab enter image description here ### QuantOverflow #### FIGARCH estimation in R I am trying to estimate a FIGARCH(1,1) model in R for Value-at-Risk purposes. As I understand it, the rugarch package does not support FIGARCH or FIEGARCH. To that end, I used the garchOxFit function (which runs the estimation in Ox, whilst interfacing with R). It all works and I am left with the the fitted conditional volatility and the parameter estimates. My problem now is to use that to get the analytic VaR estimate for the next day. For a simple GARCH(1,1) that is fine: take the last estimated conditional volatility of the sample as well as the last squared residual; plug those into the GARCH equation along with the parameter estimates to get the next day's predicted volatility. One would then use with a quantile function based on whatever distribution was assumed to calculate the analytic VaR. Problem is I am too simple to see how to do get the vol estimate with a FIGARCH model. I have the following maximum likelihood estimates for the FIGARCH parameters: Cst(V) x 10^4 : 0.076547 #ie. constant in GARCH equation (omega) d-Figarch : 0.584467 ARCH(Phi1) : 0.122547 GARCH(Beta1) : 0.643318 I have looked at Bollerslev's initial paper on FIGARCH, and am still clueless as to how one gets the next recursive volatility estimate given the parameter estimates and previous day's volatility and squared residual. Any ideas? Any help would be very much appreciated. ### StackOverflow #### read/write another process memory monad in F# I work on cheat for a single player game. I like function composition, immutability and a code without boilerplate so that's why I decided to write the cheat in F#. I finished something which works fine. I know that the code is far from perfect, but today I started my journey with F#. I wonder if there is a way to somehow remove side effects from my code. Could you give me some hint how this can be achieved? Thanks, Rafal open System; open System.Diagnostics; open System.Runtime.InteropServices; open System.Text; open System.ComponentModel; let flip f x y = f y x let curry f a b = f (a,b) let uncurry f (a,b) = f a b type MemoryOperation = int -> int -> int -> byte[] //(f:int * int * byte[] * int * byref<int> -> bool) [<DllImport("kernel32.dll")>] extern IntPtr OpenProcess(int dwDesiredAccess, bool bInheritHandle, int dwProcessId) [<DllImport("kernel32.dll")>] extern bool WriteProcessMemory(int hProcess, int lpBaseAddress, byte[] lpBuffer, int dwSize, int& lpNumberOfBytesWritten) //let WriteMemory hProcess lpBaseAddress dwSize = // let mutable buffer = Array.init dwSize byte // let mutable lpNumberOfBytesWritten = 0 // WriteProcessMemory(hProcess, lpBaseAddress, buffer, dwSize, &lpNumberOfBytesWritten) |> ignore // buffer [<DllImport("kernel32.dll")>] extern bool ReadProcessMemory(int hProcess, int lpBaseAddress, byte[] lpBuffer, int dwSize, int& lpNumberOfBytesRead) let ReadMemory hProcess lpBaseAddress dwSize = let mutable buffer = Array.init dwSize byte let mutable lpNumberOfBytesRidden = 0 ReadProcessMemory(hProcess, lpBaseAddress, buffer, dwSize, &lpNumberOfBytesRidden) |> ignore buffer let gameProcesses = Array.toList(Process.GetProcessesByName("gameName")) let openProcess (p: Process) = let PROCESS_WM_READ = 0x0010 OpenProcess(PROCESS_WM_READ, false, p.Id) let readMemory<'a>(bitConverter:byte[] -> 'a)(length:int)(memoryOperation: MemoryOperation)(memory:int)(ptr:IntPtr) = (memoryOperation ((int)ptr) memory length) |> bitConverter let bitConverter func = func |> curry |> flip <| 0 |> readMemory let intIO = bitConverter BitConverter.ToInt32 4 let booleanIO = bitConverter BitConverter.ToBoolean 1 let charIO = bitConverter BitConverter.ToChar 1 let readInt = intIO ReadMemory let readBoolean = booleanIO ReadMemory let readChar = charIO ReadMemory //let writeInt = intIO WriteMemory //let writeBoolean = booleanIO WriteMemory //let writeChar = charIO WriteMemory let readHp = readInt 0x00A20D58 [<EntryPoint>] let main argv = while true do gameProcesses |> (openProcess |> List.map) |> (readHp |> List.map) |> List.iter(printfn "%d") 0  ### Fred Wilson #### Power Law And The Long Tail If you look at the distribution of outcomes in a venture fund, you will see that it is a classic power law curve, with the best investment in each fund towering over the rest, followed by a few other strong investments, followed by a few other decent ones, and then a long tail of investments that don’t move the needle for the VC fund. But that long tail is comprised of entrepreneurs and their teams. People who have given years of their lives to a dream that was ultimately not realized. And as I have written many times over the years on this blog, I spent the majority of my time on that long tail. This is irrational behavior if you think about fund economics, but I believe it is rational behavior if you think about firm reputation. The best thing you can do for this long tail is find a good home for the portfolio company. That could be everything from a modest acquisition to an acqui-hire. If you have to do a shutdown, then I like to see it done on terms the entrepreneur can live with. All of these actions require irrational economic behavior from the investor(s). The goal is to get an exit that everyone can feel good about. The goal is not to maximize the VC’s returns from a failed investment. Because it doesn’t matter to the fund economics one bit but it can matter a lot to the entrepreneur and his or her team. ### CompsciOverflow #### NP-hard complexity proof [duplicate] This question already has an answer here: I have this resource allocation problem (a special case of my main problem), as follows, for which I have been trying to show it is NP-hard. I have a set of Items, each item i has a size s(i) and a revenue r(i). I have a limited resource C, that I need to allocate to among some items, and find the least possible revenue. Here is an example: Imagine I have 4 items whose sizes are {4,5,5,3} and the revenue of each item is {6,7,7,5}. Given capacity C=6, the minimum possible revenue is 5 which comes from allocating 3 units to the forth item, and the remaining 3 units cannot be allocated, And, let say if C=11, then the minimum revenue is 11 that comes from allocating to the first 4 and to the forth 3 and the remaining 4 units cannot be allocated. So, I am kind of looking for the worst case revenue of the possible allocation outcomes (I built a MIP formulation for). I know the basics of proof method, but I could not be able to find some problem from the NP-hard class reducible to my problem to show that my problem is NP-hard, too. As I told, I know about the complexity proof methods, just I cannot find an appropriate problem from NP-hard class. I am still looking forward for your helpful ideas, please? Thanks. ### Planet Theory #### Postdoc at North Carolina State University (apply by January 1, 2016) The Dept of Computer Science at NCState University, seeks applications for a two year Postdoc position in the Theory In Practice group of Dr. Blair D. Sullivan. Position funded by Moore Foundation Data-Driven Discovery Initiative; research focus on improving practicality of parameterized algorithms using graph structure. Anticipated start date in May-Sept 2016. Email: blair_sullivan@ncsu.edu ### StackOverflow #### Log transform dependent variable for regression tree I have a dataset where I find that the dependent (target) variable has a skewed distribution - i.e. there are a few very large values and a long tail. When I run the regression tree, one end-node is created for the large-valued observations and one end-node is created for majority of the other observations. Would it be ok to log transform the dependent (target) variable and use it for regression tree analysis ? When I tried this, I get a different set of nodes and splits that seem to have a more even distribution of observations in each bucket. With log transformation, the Rsquare value for Predicted vs. Observed is also quite good. In other words, I seem to get better testing and validation performance with log transformation. Just want to make sure log transformation is an accepted way to run regression tree when the dependent variable has a skewed distribution. Thanks ! ### CompsciOverflow #### How to prove the completeness of a rule set? I have a knowledge-based system wich contains a set of rules. Someone asked to show the completeness of a rule set, where completeness means that for every possible values of my variables, there is at least one rule that is applicable For example, suppose I have three variables:$time$,$price$, and$abort$, and the following rules:$time > 10 \lor price > 2 \rightarrow conclusionAabort = true \rightarrow conclusionB$How can I show my set of rules is complete or not? Maybe I'm missing the right keywords to search it. Do you have any accesible reference to a method showing how to do it? ### QuantOverflow #### Significance testing of average returns from Sharpe ratio I'm aware that one way to do significance testing on a strategy is based on the sampling distribution of its Sharpe (see, e.g., Lo, 2002 and Opdyke, 2008). However, it appears to me that there's another very simple way to do inference directly on the average return and standard deviation implied by the Sharpe. I'm wondering (1) if this approach is valid, or whether I'm overlooking something; (2) if there's some literature on this and similar approaches. Given a strategy with (required / targeted / IS-based) daily Sharpe ratio$SR = \frac{\mu}{\sigma}$, with returns statistics$\mu$(mean) and$\sigma$(standard deviation). The question is how long of an OS period the strategy needs at minimum to establish statistically significant returns. Assuming iid returns (big assumption), the sample mean of the daily returns in the OS period after$n$days can be modelled with (probably more appropriate to use t distribution though) $$\hat\mu \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)$$. We define statistical significance at level$\alpha$as the$z_{1-\alpha}$score (one-sided test) from zero (i.e., the null is that$\mu=0$): $$\hat\mu \geq z_{1-\alpha} \frac{\sigma}{\sqrt{n}}$$. With the definition of the daily Sharpe$SR = \frac{\mu}{\sigma}$and substituting$\mu=\hat\mu$this becomes: $$n \geq \left(\frac{z_{1-\alpha}}{SR}\right)^2$$ Intuitively this makes sense: A lower Sharpe, or higher significance level increases the minimum sampling period. For instance, at$\alpha=90\%$significance we obtain for a strategy with an annual Sharpe of 1:$n \geq (1.28 * \sqrt{252} / 1)^2 \approx 413$days, but at$\alpha=95\%$already$n \geq 682$days. #### Deduce expected exposure profile from option/structure delta? I am thinking about whether there exists a relationship between the delta of an option (or any structured derivative) and it's expected positive/negative exposure? An intuitive question would be the following: A Foward has a Delta of 1 and given the above exposure profile and the Delta of an Option with the same underlying, can I deduce that the exposure profile of the Option equals Delta * Forward_Exposure? However, after running some simulations I see that this is not the case, part of the reason being (I think) that for exposure generation one simulates values for all relevant risk parameters and not just the one which corresponds to the Delta/sensitivity. If there are any questions on Definitions of terms I used, I am happy to clarify. Image taken from Jon Gregory's book on CVA. ### TheoryOverflow #### Lossless Compression Books I am intrigued by compression techniques and I'd like some recommendations about books to study, specifically, on lossless compression algorithms and data structures. I don't know if there is a comprehensive book (or books) that deals with both compression of general sources (stream of bytes), and known inputs (e.g., integers, strings). Thanks for your references! #### Number of rounds of iterative one-round distributed color reduction We are talking about one-round coloring algorithms for distributed graphs. In "On the complexity of distributed graph coloring" (theorem 5.1) Kuhn and Wattenhoffer presented a one-round algorithm to transform an$m$-coloring to a$q$-coloring, where$q = m(d+1)/(d+2)$($d$is the max degree). We can show that if$m >= d+2$, then$q <= m-1$. [Therefore the algorithm returns a smaller coloring, as long as$m >= d+2$] They mention that by applying this one-round algorithm over and over, we can transform an$m$-coloring to a$(d+1)$-coloring in$O(d \log(m/d))$rounds. I don't understand how they reached this number. I thought that the number of rounds n needed to fulfill the equation:$m((d+1)/(d+2))^n = d+1$Because$m$is the starting coloring, and each round we get a lower coloring by a factor of$(d+1)/(d+2)$. But this doesn't yield the proper result for$n$. Why is it wrong? We want to get$n = O(d\log(m/d))$. The algorithm itself isn't very complicated and I don't think will contribute to the question; basically for every color that is larger than$q$, we choose for him a color from$\{ 1,\ldots,q \}$in a way that ensures it is a legal coloring. Anybody has any insights about the number of rounds needed to achieve a$(d+1)$-coloring? ### StackOverflow #### Using recursion with map in python I am trying to learn functional programming concepts. An exercise, Flatten a nested list using map/reduce. My code. lists = [ 1 , 2 , [ 3 , 4, 5], 6, [7, 8, 9] ] def flatten(lists): return map(lambda x: flatten(x) if isinstance(x,list) else x, lists) print flatten(lists)  I get output same as input. What did i do wrong ? How recursion works with map() ? ### CompsciOverflow #### How to show all possible implied parenthesis? Can I use recursion to find out the possible parenthesis we can add to this expression: 2*3-4*5 ? (2*(3-(4*5))) = -34 ((2*3)-(4*5)) = -14 ((2*(3-4))*5) = -10 (2*((3-4)*5)) = -10 (((2*3)-4)*5) = 10 I am not able to find out the right states for the recursion to proceed. As at any point we wouldn't know if it would have two open parenthesis or two closed parenthesis. I am just looking for ideas. ### Halfbakery #### Degrees of Separation Email Spam Protection (1.5) ### TheoryOverflow #### How Turing machines compare with process calculi w.r.t. expressivity? [duplicate] This question already has an answer here: Some time ago there were heated debates, and some researches argued, that so-called interactive computations (understood as "computations=calculations+communication"), like Pi-calculus, actors model, CSP, are a more powerful model, which can't be converted to a Turing machine (in a usually understood sense). It was also argued that "interactive computations" are better model for modern computing, and using Turing machine is very limiting as a model (for example, due to only confined to calculating functions, while current computers are essentially interactive). High profile theorists remain silent on the whole matter. So are there any solutions to this controversy except for Dina Goldin and collegues stance? I am aware of What would it mean to disprove Church-Turing thesis? Q/A, but my question is not about the Church-Turing thesis in it's own frame of applicability. I see that each agent can be TM in between input/output phases, but it seems highly counter-intuitive, that the process calculi-based composition of constructively infinite number of agents (TMs), interacting with outside world, will be equal in expressive power to a single TM. When interactive input/output is considered, it is not clear whether the context is the same as in Church-Turing thesis. Edit: found on topic: Applicability of Church-Turing thesis to interactive models of computation ### Lobsters #### with — Organize Complex SQL Queries ### CompsciOverflow #### How to convert this code from an simple array to a pointer array? C++ [on hold] So I got the basics of array's down, but I am confused on how to convert this source code into a program that will use pointers. void sortArray(int array[], int size) { bool swap; int temp; do { swap = false; for (int count = 0; count < (size - 1); count++) { if (array[count] > array[count + 1]) { temp = array[count]; array[count] = array[count + 1]; array[count + 1] = temp; swap = true; } } } while (swap);  #### Instruction Sets - Loading, inverting and storing [on hold] I'm trying to solve this exercise about instruction sets, but I'm stuck: You have 4 registers with the following content: $$R1 -> 0x0000 AAAA$$ $$R2 -> 0x0000 5555$$ $$R3 -> 0xAAAA AAAA$$ $$R4 -> 0x5555 5555$$ You're running this pseudo-code: 1. STORE R1, R3 2. STORE R2, R4 3. INV R3, R3 4. INV R4, R4 5. LOAD R1, R3 6. LOAD R2, R4 What are the contents of R1 and R2 after the code has been executed? My suggestion was that R1 would contain 0x5555 5555 and R2 would contain 0xAAAA AAAA since this is the bitwise inversion of R3 and R4, but it turns out R1 should contain 0x0000 5555 and R2 should contain 0x0000 AAAA, and I can't make any sense out of it. Please help me understand the logic behind this solution. ### StackOverflow #### CPS merge sort causes a stack overflow Since I had problems with stack overflows due to a non-tail recursion, I used the continuations so as to make sorting of large list feasible. I implemeneted the sorting this way (you can see the whole code here: http://paste.ubuntu.com/13481004/) let merge_sort l = let rec merge_sort' l cont = match l with | [] -> cont [] | [x] -> cont [x] | _ -> let (a,b) = split l in merge_sort' a (fun leftRes -> merge_sort' b (* OVERFLOW HERE *) (fun rightRes -> cont (merge leftRes rightRes) ) ) in merge_sort' l (fun x -> x)  I get a stack overflow nevertheless, in the indicated line. What am I doing wrong? #### Higher-order functions in VHDL or Verilog I've been programming in more functional-style languages and have gotten to appreciate things like tuples, and higher-order functions such as maps, and folds/aggregates. Do either VHDL or Verilog have any of these kinds of constructs? It seems like an HDL would be the perfect application of these kinds of pure combinators—while these combinators are merely simulated by an imperative process in Haskell, F#, LINQ, etc., hardware actually is pure functionality between clock ticks. An initial glance, both languages look oddly imperative. To me it seems so weird that imperative CPU applications are starting to be written in functional languages, while functional HDL code is written in an imperative-looking style. Anyway, the question is, is there any way to do even simple things like divByThreeCount = count (\x -> x mod 3 == 0) myArray  or myArray2 = map (\x -> x mod 3) myArray  or even better yet let me define my own higher-level constructs recursively, in either of these languages? If not, which language is the closest? Does either language seem to be planning on making progress in that direction? Or is this really not as useful as I, being a rookie in HDLs, would think? #### Alternatives to repeated use of partial when using clojure's comp Given a collection" [{:key "key_1" :value "value_1"}, {:key "key_2" :value "value_2"}]  I would like to convert this to: {"key_1" "value_1" "key_2" "value_2"}  An function to do this would be: (defn long->wide [xs] (apply hash-map (flatten (map vals xs))))  I might simplify this using the threading macro: (defn long->wide [xs] (->> xs (map vals) (flatten) (apply hash-map)))  This still requires explicit definition of the function argument which I am not doing anything with other than passing to the first function. I might then rewrite this using comp to remove this: (def long->wide (comp (partial apply hash-map) flatten (partial map vals)))  This however requires repeated use of partial which to me is a lot of noise in the function. Is there a some function in clojure that combines comp and ->> so I can create a higher order function without repeated use of partial, and also which out having to create a new function? ### CompsciOverflow #### Average redundancy in Huffman or Hu-Tucker codes on random symbol probabilities Huffman and Hu-Tucker codes are well-known compression schemes, which both come close to the entropy lower bound. It is known that if$L_1$and$L_2$are the lengths of a Huffman resp. Hu-Tucker code, then$H\le L_1 \le H+1$and$H\le L_2< H+2$, where$H$is the (base-2 Shannon) entropy of the symbol distribution. Is anything known about the average redundancy of Huffman and Hu-Tucker codes, that is, the expected value of$R_1 = L_1-H$or$R_2 = L_2-H$, when the symbol weights are random? ## Definition & Details: Assume we have$n$symbols$a_1,\ldots,a_n$with weights$\vec P=(P_1,\ldots,P_n)$, where these weights are themselves random, e.g., they are uniformly drawn from all stochastic vectors (so that$P_i\ge0$and$P_1+\cdots+P_n = 1$a.s.); stated otherwise:$\vec P$has a$\mathrm{Dirichlet}(1,\ldots,1)$distribution). Then$H = - \sum_{i=1}^n P_i \log_2(P_i)$and$L_{1,2} = \sum P_i \operatorname{depth}(a_i)$, where$\mathrm{depth}(a_i)$is the depth (number of edges on path from root) of leaf$a_i$in an optimal binary tree on leaves$a_1,\ldots,a_n$(Huffman tree) for$L_1$and the depth in an optimal binary search tree with leaves$a_1\le a_2\le\cdots a_n$(optimal alphabetic tree). ## My Attempts: I am seeking an analytic solution, in the best case as a closed formula in$n$. I can approximate the expectations with a Monte Carlo simulation, here are a few example values (from 10,000,000 repetitions each; up to the given digits several repetitions agreed; * only 10,000 repetitions) n E[R2] 3 0.297 4 0.284 5 0.268 6 0.253* 10 0.217*  Note that the upper bound of$2$is far from tight here. This question is probably related, but was posed unclearly and does not have an answer. A somewhat similar problem, has been solved by Szpankowski. There, alphabet symbols are blocks of bits of length$n$, generated by a memoryless source with$p<0.5$is the probability for$0$in the block. The average redundancy of different codes, including Huffman's, are computed for large$n$. Here the symbol probabilities are fixed, and so is the redundancy; the average here means average redundancy over all symbols, not over random symbols weights. This does not answer my question above thus. ### QuantOverflow #### European call down and out option (geometric Brownian motion, Monte Carlo, Euler) I need to estimate the expected value and the Greeks of an European call down and out option, assuming geometrical Brownian motion of the asset, with Monte Carlo simulation employing Euler discretization scheme. Given are the strike K,S0, Barier(B), volatility (v), risk free rate(r) and time to end (T). I can program the code, but I cant find the mathematics behind the calculations. Can someone show them to me? Thank you in advance. ### UnixOverflow #### Disable SCSI (CAM) device I want to tell the FreeBSD kernel to completely ignore a SCSI cam(4) device. # cat /var/run/dmesg.boot | grep '\<da1\>' ada0: 600.000MB/s transfers (SATA 3.x, da1 at umass-sim0 bus 0 scbus4 target 0 lun 1 da1: <HP iLO LUN 01 Media 0 2.09> Fixed Direct Access SCSI-0 device da1: Serial Number 000002660A01 da1: 40.000MB/s transfers da1: 1024MB (2097152 512 byte sectors: 64H 32S/T 1024C) da1: quirks=0x2<NO_6_BYTE> # camcontrol devlist <ST1000DM003-1ER162 CC46> at scbus1 target 0 lun 0 (pass0,ada0) <ST1000DM003-1ER162 CC46> at scbus2 target 0 lun 0 (pass1,ada1) <HP iLO Internal SD-CARD 2.09> at scbus4 target 0 lun 0 (pass2,da0) <HP iLO LUN 01 Media 0 2.09> at scbus4 target 0 lun 1 (pass3,da1)  I don't know what the da1 device is, nor do I want it visible on my system. FreeNAS sees this device and offers it as an option to create a volume on, which is not something I ever want to do. How can I accomplish this? Update: It appears this can be accomplished using device hints, setting the keyword "disabled"=1: hint.driver.unit.keyword="value"  The only problem is, I'm not sure how to specify driver and unit for this device. ### Lobsters #### What is a 'unikernel'? ### CompsciOverflow #### Show that each of these languages is decidable for regular grammars Show that each of these languages is decidable for regular grammars by presenting a clear algorithm for each. In each case, assume is the encoding of a regular grammar as a string and L(G) is the language generated by G. a. ERegular = { | L(G) = ∅ } b. InfiniteRegular = { | L(G) is infinite } c. EQRegular = { | L(G) = L(H) } Algorithms should use concepts such as reachability and product machine. You may assume that any regular grammar can be converted into a DFA without reproving this fact. You may assume that it can be determined whether one state can be reached (in one or more steps) from another within a DFA. Hello, I'm having a bit of trouble on this problem. I know what a regular grammar, but what confuses me is the "show each language is decidable for regular grammars" part. What exactly does that mean? The TA told me my algorithm can be informal, but how should my algorithm be structured? I really need to see an example before I can fully understand this. Any help is appreciated! #### The symbolic differentiation of univariate expressions I was reading "Doug McIlroy: McCarthy Presents Lisp" and the phrase "symbolic differentiation of univariate expressions" triggered a faint memory of a demonstration of differentiation done in haskell using higher order functions. (I think my memory is of using the language to produce a function that is the differential of a function given.) However I haven't been able to find any other reference to the above phrase in lisp or ML-based languages. Does anyone have more information on this, and is "symbolic differentiation of univariate expressions" the same thing as the memory I described? ### QuantOverflow #### Logistic Regression of tick data I've been given some data (it's financial tick data) and I want to predict based on some observed variables whether the next move will be up, down or unchanged. So I have been trying to use multinomial logistic regression, this is my first time doing logistic regression so I want to check that I have done this correctly and that my results look reasonable. Right now I am doing 3 bi-variate logistic regressions. So I code the data such that I have three new time series labelled up, down and unchanged. These are generated by testing whether the move n steps ahead was up, down or unchanged in the original series and adding a one to the appropriate array and making all other entries zero. I then do a bi-variate logistic regression of these up, down and unchanged arrays against the regressors individually. I can then calculate the probability of each using the transformation: $$\textrm{prob} = 1/(1+\exp[-Bx])$$ where$\beta$are the betas from the bi-variate logistic regressions. And$x$is the value of the regressors. This gives me the probability of up, down and unchanged. I then simply compare if$\textrm{probability of UP} > \textrm{probability of Down}$if so the model predicts up and vice versa. Q.1) Is my methodology correct? Right now I am doing all calculations on the price series (not the returns series)? Q.2) When I test this accuracy in sample I am getting 70% accuracy in sample (for both up and down moves)? Is that a reasonable test score in sample? Q.3) The model probability for unchanged is very low typically around 14%. So unchanged is never selected (because the probabilities of up down moves are always much larger). However unchanged is the most commonly observed change with an unconditional probability of 91%. Is there a way I can correct the model so that unchanged is forecast accurately by the model. Update: Here is the code unfortunately I am getting differences between 2 variable regression results and the 3 variable results! Possible error between two: 2 variable regression run using mnrfit() and an equivalent 3 variable version on asset returns. The returns have been classified as positive, negative or flat. The logistic regression is then run on the rtns vs the classified returns (this is a simple test to check that the regression functions as intended). When I do this for the 2 variable version i.e. Up rtns v everything else, the regression gives an estimate of the probability that a 0 return is not an up return of 88%. As the return size is increased the probability that it is positive increases eventually converging to 1 (as you would expect). Also as increasingly negative rtns are put into the logistic regression model the probability that the rtn is positive goes to zero. The same is true for the 2 Variable estimate of Down returns v everything else. The figures are similar to those above but with the signs of the returns reversed. Now when I run the 3 variable version. Things initially look OK: when given a return of zero it estimates the probability that it is zero to be 86% with a prob of a down move =6.4% and an up move =7.6% so very similar to the 2 variable case. Moreover when larger and larger returns are entered the probability that it is a positive return converges to 1 as you would expect but when you put in larger and larger negative returns the probability that the return is negative converges to zero while the probability it is equal to zero increases to 1!!! Which is clearly wrong. Data1 = LoadMat_SHFE_Range(Contract1, StartDate, EndDate); rtn=(Data1.Mid(2:end)-Data1.Mid(1:end-1))./(Data1.Mid(1:end-1)); NStep=0; Up=nan(length(rtn),1); Down=nan(length(rtn),1); Flat=nan(length(rtn),1); RtnClass=nan(length(rtn),1); for i=1:length(rtn)-NStep if(rtn(i+NStep)>0) Up(i)=2; Down(i)=1; Flat(i)=1; elseif(rtn(i+NStep)<0) Up(i)=1; Down(i)=2; Flat(i)=1; elseif(rtn(i+NStep)==0) Up(i)=1; Down(i)=1; Flat(i)=2; end end [BUp,dev,stats] = mnrfit(rtn,Up); MatProbUp = mnrval(BUp,0.1); [BDown,dev,stats] = mnrfit(rtn,Down); MatProbDown = mnrval(BDown,0.1); for i=1:length(rtn) if(rtn(i)>0) RtnClass(i)=3; elseif(rtn(i)<0) RtnClass(i)=2; elseif(rtn(i)==0) RtnClass(i)=1; end end [BM,dev,stats] = mnrfit(rtn,RtnClass,'model','ordinal'); [pihat,dlow,hi] = mnrval(BM,0,stats,'model','ordinal');  ### CompsciOverflow #### Bipartite Graph - How to determine largest subsets that are all connected I have a bipartite graph$G = (U,V,E)$, where$U$and$V$are disjoint node sets and$U \cup V$is the set of all vertices, and$E$is the set of all edges. I'm looking for subsets$U' \subseteq U$and$V' \subseteq V$such that there exists an edge$\{i,j\} \in E$for all$i \in U', j \in V'$and$|U'| + |V'|$is maximum. Ideally, I'm looking for the absolute best solution (a single one is sufficient if there is more than one), but happy to find any solution so that$U'$contains at least 50% of$U$and$V'$contains at least 50% of$V$(and no solution if no such solution exists). The naive algorithm (construct all subsets, calculate if they satisfy the requirements and pick the largest) does not seem very promising to find a quick solution. I'm not that familiar with the terms in graph theory, so I'll be perfectly happy with references to "standard problems" if that's what I'm looking for. ### StackOverflow #### linear regression with multiple variables in matlab, formula and code do not match I have the following datasets: X X = 1.0000 0.1300 -0.2237 1.0000 -0.5042 -0.2237 1.0000 0.5025 -0.2237 1.0000 -0.7357 -1.5378 1.0000 1.2575 1.0904 1.0000 -0.0197 1.0904 1.0000 -0.5872 -0.2237 1.0000 -0.7219 -0.2237 1.0000 -0.7810 -0.2237 1.0000 -0.6376 -0.2237 1.0000 -0.0764 1.0904 1.0000 -0.0009 -0.2237 1.0000 -0.1393 -0.2237 1.0000 3.1173 2.4045 1.0000 -0.9220 -0.2237 1.0000 0.3766 1.0904 1.0000 -0.8565 -1.5378 1.0000 -0.9622 -0.2237 1.0000 0.7655 1.0904 1.0000 1.2965 1.0904 1.0000 -0.2940 -0.2237 1.0000 -0.1418 -1.5378 1.0000 -0.4992 -0.2237 1.0000 -0.0487 1.0904 1.0000 2.3774 -0.2237 1.0000 -1.1334 -0.2237 1.0000 -0.6829 -0.2237 1.0000 0.6610 -0.2237 1.0000 0.2508 -0.2237 1.0000 0.8007 -0.2237 1.0000 -0.2034 -1.5378 1.0000 -1.2592 -2.8519 1.0000 0.0495 1.0904 1.0000 1.4299 -0.2237 1.0000 -0.2387 1.0904 1.0000 -0.7093 -0.2237 1.0000 -0.9584 -0.2237 1.0000 0.1652 1.0904 1.0000 2.7864 1.0904 1.0000 0.2030 1.0904 1.0000 -0.4237 -1.5378 1.0000 0.2986 -0.2237 1.0000 0.7126 1.0904 1.0000 -1.0075 -0.2237 1.0000 -1.4454 -1.5378 1.0000 -0.1871 1.0904 1.0000 -1.0037 -0.2237  theta 0 0 0  y y = 399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999 699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900 464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000 287000 368500 329900 314000 299000 179900 299900 239500  The X set represents values for multiple variable regression, the first colum stands for X0, second X1; and so on. The implementation formula is something like: I have implemented a matlab code which is: for i=1:size(theta,1) h=X*theta; sumE=sum((h-y).*X(:,i)); theta(i)=theta(i)-alpha*(1/m)*sumE; end  which is inside a for loop going from 1 to a n number of iterations (the value of m is not relevant, it can be set up to 40 for example). The problem is that even though the code works and the result is the one expected, when I submit it to a online checking program it appears that my results are wrong. The reason is that I should update theta simultaneously. I have gotten the following Matlab code from Internet: h = X*theta; theta = theta - alpha / m * (X'*(h - y));  when I run the Internet solution it gives me almost the same answer as mine, with only a subtle difference in the 6th decimal position. When I submit that answer to the online program it is fully accepted, but I was wondering where the summation goes? in the formula explicitly indicates a summation which is no longer in the Internet solution. Maybe both codes are fine, but I do not know if the Internet author has made some linear algebra trick. Any help? Thanks #### learn to cluster matrices according to matrix orthogonality [on hold] I have a large number of matrices sets where each matrix's num of columns < num of rows (so the columns span only a part of the possible columns space) The matrices of each set are clustered to clusters of 4 by some unknown logic, but it is very related to how orthogonal the columns of the matrices are (clusters will include matrices which their total columns are as close to orthogonal as possible). I am looking for the correct machine learning tool which, given the clusters created, will imitate the logic and, using n-fold cross validation, tell me how close it is to imitate it. A python solution will be best, but any algorithm, machine learning tool (e.g. Weka), programming language or framework will be highly appreciated. ### CompsciOverflow #### set of Kolmogorov-random strings is co-re given RC = {x : C(x) ≥ |x|} is a set of Kolmogorov-random strings. How can I show that RC is co-re I have been reading this paper What Can be Efficiently Reduced to the Kolmogorov-Random Strings? which just says that we know that by Kum96 we know that RC is Co-re I was not able to find the proof of how that is proved ### StackOverflow #### What is a good way to group similar but irregular user input [on hold] I have some data that I want to summarize however the input from users are not always identical for example, ABC Ltd, ABC Limited, ABC ltd., etc. In this case, of course I can take the left most 3 characters, but my question is how to programe it dynamically to identify pattern perhaps? In the end I need a uniform set of names for aggregation. Thanks. #### How do i create my own clasiifier for RMOS Gurgaon? [on hold] I am creating my own classifier for face detection.I have two folder one for storing positive images and other for storing negative images. And I make .txt files for both. Now I want to create training samples of positive imgaes. So I give command 'opencv_createsamples -info positives.txt -vec myvec.vec -w 24 -h 24 '. But It shows like this.It doesn't create any samples.What is the reason?Could any one help me. Info file name: positives.txt Img file name: (NULL) Vec file name: myvec.vec BG file name: (NULL) Num: 1000 BG color: 0 BG threshold: 80 Invert: FALSE Max intensity deviation: 40 Max x angle: 1.1 Max y angle: 1.1 Max z angle: 0.5 Show samples: FALSE Width: 24 Height: 24 Create training samples from images collection... positives.txt(1) : parse errorDone. Created 0 samples #### Filtering a list of JavaBeans with Google Guava In a Java program, I have a list of beans that I want to filter based on a specific property. For example, say I have a list of Person, a JavaBean, where Person has many properties, among them 'name'. I also have a list of names. Now I want to find all the persons whose name is in the name list. What is the best way to execute this filter using Google Guava? So far, I've thought about combining Guava with Apache beanutils, but that doesn't seem elegant. I've also found a reflection extension library here: http://code.google.com/p/guava-reflection/, but I'm not sure how to use it (there's little documentation). Any thoughts? p.s. Can you tell I really miss Python list comprehension? ### CompsciOverflow #### How to resolve a recurrence relation in the form of$T(n) = T(f(n))*T(g(n)) + h(n)$I am basically trying to solve the following question: Given a set$P = \{\{1\},\{2\},\dots,\{n\}\}$of$n$sets of elements, our aim is to merge these elements into one set. At each step, sets can be only merged pairwise. How many different ways can this merge happen? For instance, given$P = \{\{a\},\{b\},\{c\},\{d\}\}$: After the first step we can end up with three different combinations: \begin{array}{c} P_{11} = \{\{a,b\}, \{c,d\}\}\\ P_{12} = \{\{a,c\}, \{b,d\}\}\\ P_{13} = \{\{a,d\}, \{b,c\}\}\\ \end{array} And all the sets will be merged after the second step: \begin{array}{c} P_{2} = \{\{a,b,c,d\}\}\\ \end{array} All the examples I could find about recurrences is summation of two recurring functions such as$T(n) = T(n) + T(n/2) + \Theta(n)$However, I want to solve a recurrence relation where recurring function is multiplied by itself:$T(n) = T(n^2-n)*T(n/2)$I am not sure that I write the recursion correct but this is what I want: \begin{array}{l} n\\ \dfrac{n(n-1)}{2}*\dfrac{n}{2}\\ \dfrac{n(n-1)}{2}*\dfrac{n}{2}*\dfrac{\frac{n}{2}\left(\frac{n}{2}-1\right)}{2}*\dfrac{n}{4}\\ \end{array} At first sight, the number of nodes in the recursion tree grows like$O(n) \rightarrow O(n^3) \rightarrow O(n^6) \dots$at each level, which can be formulated as$n^{2k}$. Since the size of the problem is reduced by a factor of 2, we have$\log n$depth in the recursion tree. Therefore,$n^{2k} = \log n$and$2k = \log_{n}\log n$which does not seem a reasonable size. Where am I mistaken? #### Java - Splitting a image into 4 [on hold] So basically I have a image and I need to split it into 4 smaller images. All I got right now is that the images print on the top- left corner but I need to print on in the top right, bottom left and bottom right. I have two for loops that go through each pixel of the images. This is the code that prints out just the top-left. setRGB(x/2,y/2,everypixel)  the "everypixel" is just the pixels in the image. That will print out just the top-left but how do I print it out for every other corner. Thanks. ### Lobsters #### DEF CON 2 (1994) #### DEF CON 1 (1993) ### StackOverflow #### sharing data between classes using public static declaration I made program, and i divided several functions(system_flow.cs,make_array.cs etc ...) to share the value of variables by using public static declaration. But my teacher(programmer) said using public static to share the value of variables to share the variables is not good for program. So can you tell me why it is not good for program? Also is it mean that i should unite every functions for one class file? i used public static like this way private string NAME; public string Name; { get { return this.NAME; } set { this.NAME = value; }  • my teacher said the reason it is not good is when people work together and someone modifies the public data then the program starts to screw up. #### Does Functional Programming Replace GoF Design Patterns? Since I started learning F# and OCaml last year, I've read a huge number of articles which insist that design patterns (especially in Java) are workarounds for the missing features in imperative languages. One article I found makes a fairly strong claim: Most people I've met have read the Design Patterns book by the Gang of Four. Any self respecting programmer will tell you that the book is language agnostic and the patterns apply to software engineering in general, regardless of which language you use. This is a noble claim. Unfortunately it is far removed from the truth. Functional languages are extremely expressive. In a functional language one does not need design patterns because the language is likely so high level, you end up programming in concepts that eliminate design patterns all together. The main features of functional programming include functions as first-class values, currying, immutable values, etc. It doesn't seem obvious to me that OO design patterns are approximating any of those features. Additionally, in functional languages which support OOP (such as F# and OCaml), it seems obvious to me that programmers using these languages would use the same design patterns found available to every other OOP language. In fact, right now I use F# and OCaml everyday, and there are no striking differences between the patterns I use in these languages vs the patterns I use when I write in Java. Is there any truth to the claim that functional programming eliminates the need for OOP design patterns? If so, could you post or link to an example of a typical OOP design pattern and its functional equivalent? #### Q Learning Techniuqe for not falling in fires Please take a look at picture below : My Objective is that the agent rotating and moving in the environment and not falling in fire holes, I have think like this : Do for 1000 episodes: An Episode : start to traverse the environment; if falls into a hole , back to first place !  So I have read some where : goal is an end point for an episode , So if we think that goal is not to fall in fires , the opposite of the goal (i.e. putting in fire holes) will be end point of an episode . what you will suggest for goal setting ? Another question is that why should I set the reward matrix ? I have read that Q Learning is Model Free ! I know that In Q Learning we will setup the goal and not the way for achieving to it . ( in contrast to supervised learning.) #### scikit-learn multi dimensional features i have a question concerning scikit-learn. Is it possible to merge a multi dimensional feature list to one feature vector. For example: I have results from an application analysis and I would like to represent an application with one feature vector. In case of network traffic, an analysis result looks like the following:  traffic = [ { "http_body": "http_body_data", "length": 1024 }, { "http_body2": "http_body_data2", "length": 2048 }, ... and many more ]  So each dict in traffic list describes one network activity of a specific application. I would like to generate a feature vector which contains all these information for one application to be able to generate a model out of the analysis results from a variety of applications. How can I do this with scikit-learn? Thank you in advance! ### Lobsters #### How to Make Mistakes in Python ### XKCD #### Hoverboard ### CompsciOverflow #### Complexity analysis on directed acyclic graphs Consider a directed acyclic graph (DAG) -- that is, a directed graph without any directed loops. An example of such a graph is shown below. I assume that each node takes on binary values,$0$or$1$. Now, let$x_i\in\{0,1\}^N$($N=6$in the above DAG) be a valid state on a DAG if for any enabled node in the DAG, all of its predecessor nodes are also enabled. The complete list of such valid states for the above example DAG are shown below. Let$\mathcal{X} = \{x_1,x_2,\ldots,x_{11}\}$. I wish to characterize the size of$\mathcal{X}$as a function of the DAG. Specifically, for a given DAG, can we say anything about the order in which$|\mathcal{X}|$grows as a function of some property of the DAG? My intuition tells me that it should grow on the order of$2^{\max_n \{\text{deg}^-(n)\}}$where$\text{deg}^-(n)$denotes the indegree of node$n$but I can't seem to prove this. Does anyone have any ideas or suggestions for some relevant literature? I originally asked this question on the CS theory stack exchange but was told to post here. I received the following comments: Kaveh: "You are essentially want to know the number of directed sets of a given poset. The tight upper bound is$O(2^n)$. For max out-degree 2 and max in-degree 1 you can get$2^{O(n)}$" Ricky Demer: "In fact, one automatically gets$2^n$from [max out-degree 0 and max in-degree 0]" "The number of valid states is always at least 2 to [ the size of the largest antichain in the corresponding partial order ]" EDIT: I've included a couple of examples of extreme cases in the figure below (also, I should mention that I'm only interested in connected graphs). From the above figure, the graph on the left has$2^6+1 = 65$valid states, whereas the one on the right only has$7$valid states. What property of these graphs results in such a drastic difference in dimensionality of$\mathcal{X}$? Assigning a notion of "width" and "height" to the DAG -- the graph on the left has a large "width" and a small "height", whereas the graph on the right has a small "width" and large "height" -- it seems like the dimension grows on the order of a function of "width/height". Is this a correct way of thinking about this problem? Can this idea be generalized to any DAG? #### Delete a range of keys in a binary search tree in better than$O(n\lg n)$? Obviously, the brute force method of: DeleteRange(root, low, high) for n = low to high if n == root.key // key found return DeleteNode(root) // O(lg n) to delete elseif n < root.key // in left sub-tree root.left = DeleteRange(root.left, n, high) // recur into left sub-tree elseif n > root.key // in right sub-tree root.right = DeleteRange(root.right, n, high) // recur into right sub-tree else // root.key == null, key not found return null return root  would take$O(n\lg n)$time. So is there any "smarter" way of deletion that would do the same thing with less complexity, perhaps pruning entire sub-trees at once? Assume that the deleted nodes are not needed and the only concern is with returning the root of a binary search tree where the nodes between the range of two keys are deleted. #### Solve${\neg (P \wedge Q)} \vdash {Q \to \neg P}$[on hold]${\neg (P \wedge Q)} \vdash {Q \to \neg P}$I don't know where to start. Negation confuses me. I have to solve this in Isabelle (a program), but if someone explains how to solve using natural deduction, it will be enough help. ### QuantOverflow #### How does Algorithmic Differentiation work and where can it be applied? The title says it all, but let me expand on it. Algorithmic differentiation seems to be a method that allows a program / compiler to determine what the derivative is of a function. I imagine it's a great tool for things like optimization, since many of these algorithms work better if we know the cost function and its derivative. I know that algorithmic differentiation is not the same as numerical differentiation, since the derivative is not exact. But is it then the same as symbolic differentiation, as implemented in e.g. SymPy, Matlab or Mathematica? Second, where can it actually be used in quantitative finance? I mentioned optimization, so calibration comes to mind. Greeks are also a natural application, since these are derivatives by definition, but how does that work in the case of Monte Carlo? And are there any other applications? ### StackOverflow #### Can someone answer the following questions about this function? def listsum(numList): if len(numList) == 1: return numList[0] else: return numList[0] + listsum(numList[1:]) print(listsum([2,4,6,8,10]))  How many recursive calls are made when computing the sum of the list [2,4,6,8,10]? I think it's 5, but maybe I am wrong Also, What will the recursive function listsum return if we call the function with the empty list? I think it should be 0, because theSum is initially 0, maybe I am wrong #### How to use Google TensorFlow How can I use Google TensorFlow (or any other Machine Learning API) to automatically classify the documents (scanned .pdf, images) on the basis of company name, financial year, etc. There can be N number of factors that can be used to analyse the data. #### Man made object detection in an image [on hold] I am not very familar with matlab (image provcessing). I have a very simple task. I have a lot of images in a directory. I have to find man made objects e.g. cars, chairs, mobile etc. How I'll do it. Please provide its details i.e. its steps at least. Some images also contain little noise. I have to find some object in given images, if they exist there e.g. make a rectange around then. ### CompsciOverflow #### Shortest integer path [on hold] Let integer path in$\mathbb{Z}$is some finite integer sequence$(a_i)_{i =1,\ldots, n}$. We denote length of integer path by$\sum_i |a_{i + 1} - a_i|$. We denote integer path by$a_1 \to \ldots \to a_n$. If$a_{i_1} \le k \le a_{i_2}$we assume same integer paths$a_1 \to \ldots \to a_{i_1} \to a_{i_2}\to \ldots \to a_n$and$a_1 \to \ldots \to a_{i_1} \to k \to a_{i_2} \to \ldots \to a_n$. We say that integer$k$be in path if exists$i$:$a_i \le k \le a_{i + 1}$. Let$i_k = min \{i| a_i \le k \le a_{i + 1}\}$for$k$in path$a_1 \to \ldots \to a_n$. Then length of integer path segment between$k_1$and$k_2\ \ \rho (k_1, k_2)$is length of integer path $$k_1 \to a_{i_{k_1} + 1} \ldots \to a_{i_{k_2}} \to k_2$$ If particular, length of integer path segment$\rho(x,z) \le \rho(x,y) + \rho(y,z)$and$\rho(x,y) \ge |x - y|$.$\ \ $Let$x_1,l_1,...,x_n,l_n$be positive integer numbers,$x_1 < \ldots < x_n$. We need to find shortest integer path in$\mathbb{Z}$with begin in any positive integer$x$and propeties:$x_1,...,x_n$are in this path and length of path segment (no line segment!) between$x$and$x_i$not more than$l_i$. I want some good algorithm to find length of this shortest path or error if this shortest path not exists. Desired asymptotics for me: if$max_i (x_i) < n^2$than solution time$O(n^2)$. First example$(x_1,...,x_5) = (1, 3, 5, 8, 10)$,$(l_1,...,l_5) = (3, 1, 8, 19, 15)$. Shortest path $$x = x_2 = 3(length = 0 < 1) \to 1(length = 2 < 3) \to 5(length = 6 < 8) \to 8(length = 9 < 19) \to 10(length = 11 < 15)$$ Shortest path length$= 11$. Second example$(x_1, x_2, x_3, x_4, x_5) = (1, 2, 3, 4, 5)$,$(l_1, l_2, l_3, l_4, l_5) = (5, 1, 2, 4, 3)$. No solution, because begin path$x = x_2 = 2 \to x_5 = 5$(if not than$\rho(x,x_2) > l_2$or$\rho(x,x_5) > l_5$), but$|x_5 - x_1| > l_1 - \rho(x,x_5)$. My ideas First, we can use algorithm: 1) Sort$l = (l_1,\ldots, l_n)$2) Let$sort(l) = (l_{i_1}, \ldots, l_{i_n})$. Than our path$x = x_{i_1} \to \ldots \to x_{i_n}$. Counterexample:$(x_1, x_2, x_3) = (1, 2, 3)$,$(l_1, l_2, l_3) = (2, 1, 3)$In algorithm, we have length of our path is$3$, but$x_1 \to x_2 \to x_3$is fine too and his length is$2$. Algorithms like brute force or enumeration of all correct paths have bad asymptotics. I havent any good idea for this question. Also posted in http://stackoverflow.com/questions/33725674/algorithm-for-shortest-path-in-axis-ox-with-constraints , but no answers. Thank you for any help! ### Lobsters #### Is Google violating their own HPKP RFC or am I missing something obvious? ### StackOverflow #### Backpropagation algorithm through cross-channel local response normalization (LRN) layer I am working on replicating a neural network. I'm trying to get an understanding of how the standard layer types work. In particular, I'm having trouble finding a description anywhere of how cross-channel normalisation layers behave on the backward-pass. Since the normalization layer has no parameters, I could guess two possible options: 1. The error gradients from the next (i.e. later) layer are passed backwards without doing anything to them. 2. The error gradients are normalized in the same way the activations are normalized across channels in the forward pass. I can't think of a reason why you'd do one over the other based on any intuition, hence why I'd like some help on this. EDIT1: The layer is a standard layer in caffe, as described here http://caffe.berkeleyvision.org/tutorial/layers.html (see 'Local Response Normalization (LRN)'). The layer's implementation in the forward pass is described in section 3.3 of the alexNet paper: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf EDIT2: I believe the forward and backward pass algorithms are described in both the Torch library here: https://github.com/soumith/cudnn.torch/blob/master/SpatialCrossMapLRN.lua and in the Caffe library here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lrn_layer.cpp Please could anyone who is familiar with either/both of these translate the method for the backward pass stage into plain english? ### Planet Theory #### Postdoc at Princeton Computer Science (apply by Dec 15, 2015) The Department of Computer Science at Princeton University is seeking applications for postdoctoral or more senior research positions in theoretical computer science and theoretical machine learning. Positions are for one year with the possibility of renewal. Candidates must have a PhD in Computer Science or a related field. These positions are subject to the University’s background check poli Email: mbraverm@cs.princeton.edu ### Lobsters #### MarI/O - Machine Learning for Video Games #### JUnit Lambda - The Prototype In-depth exploration of the JUnit 5 prototype. Comments ### StackOverflow #### How to test if two functions are the same? I found a code snippet somewhere online: (letrec ([id (lambda (v) v)] [ctx0 (lambda (v) (k ,v))] ..... ..... (if (memq ctx (list ctx0 id)) <---- condition always return false .....  where ctx is also a function: However I could never make the test-statement return true. Then I have the following test: (define ctx0 (lambda (v) (k ,v))) (define ctx1 (lambda (v) (k ,v))) (eq? ctx0 ctx1) => #f (eqv? ctx0 ctx1) => #f (equal? ctx0 ctx1) => #f  Which make me suspect that two function are always different since they have different memory location. But if functions can be compared against other functions, how can I test if two function are the same? and what if they have different variable name? for example: (lambda (x) (+ x 1)) and (lambda (y) (+ y 1)) P.S. I use DrRacket to test the code. ### arXiv Data Structures and Algorithms #### Approximation Algorithms for Route Planning with Nonlinear Objectives. (arXiv:1511.07412v1 [cs.DS]) We consider optimal route planning when the objective function is a general nonlinear and non-monotonic function. Such an objective models user behavior more accurately, for example, when a user is risk-averse, or the utility function needs to capture a penalty for early arrival. It is known that as nonlinearity arises, the problem becomes NP-hard and little is known about computing optimal solutions when in addition there is no monotonicity guarantee. We show that an approximately optimal non-simple path can be efficiently computed under some natural constraints. In particular, we provide a fully polynomial approximation scheme under hop constraints. Our approximation algorithm can extend to run in pseudo-polynomial time under a more general linear constraint that sometimes is useful. As a by-product, we show that our algorithm can be applied to the problem of finding a path that is most likely to be on time for a given deadline. #### Ad auctions and cascade model: GSP inefficiency and algorithms. (arXiv:1511.07397v1 [cs.GT]) The design of the best economic mechanism for Sponsored Search Auctions (SSAs) is a central task in computational mechanism design/game theory. Two open questions concern the adoption of user models more accurate than that one currently used and the choice between Generalized Second Price auction (GSP) and Vickrey-Clark-Groves mechanism (VCG). In this paper, we provide some contributions to answer these questions. We study Price of Anarchy (PoA) and Price of Stability (PoS) over social welfare and auctioneer's revenue of GSP w.r.t. the VCG when the users follow the famous cascade model. Furthermore, we provide exact, randomized, and approximate algorithms, showing that in real-world settings (Yahoo! Webscope A3 dataset, 10 available slots) optimal allocations can be found in less than 1s with up to 1000 ads, and can be approximated in less than 20ms even with more than 1000 ads with an average accuracy greater than 99%. #### GPU-based Acceleration of Deep Convolutional Neural Networks on Mobile Platforms. (arXiv:1511.07376v1 [cs.DC]) Mobile applications running on wearable devices and smartphones can greatly benefit from accurate and scalable deep CNN-based machine learning algorithms. While mobile CPU performance does not match the intensive computational requirement of deep CNNs, the embedded GPU which already exists in many mobile platforms can be leveraged for acceleration of CNN computations on the local device and without the use of a cloud service. We present a GPU-based accelerated deep CNN engine for mobile platforms with upto 60X speedup. #### Optimal Trading with Linear and (small) Non-Linear Costs. (arXiv:1511.07359v1 [q-fin.TR]) We reconsider the problem of optimal trading in the presence of linear and quadratic costs, for arbitrary linear costs but in the limit where quadratic costs are small. Using matched asymptotic expansion techniques, we find that the trading speed vanishes inside a band that is narrower than in the absence of quadratic costs, by an amount that scales as the one-third power of quadratic costs. Outside of the band, we find three regimes: a small boundary layer where the velocity vanishes linearly with the distance to the band, an intermediate region where the velocity behaves as a square-root of that distance, and a far region where it becomes linear. Our solution is consistent with available numerical results. We determine the conditions in which our expansion is useful in practical applications, and generalize our solution to other forms of non-linear costs. #### A Note on Flagg and Friedman's Epistemic and Intuitionistic Formal Systems. (arXiv:1511.07319v1 [cs.LO]) We report our findings on the properties of Flagg and Friedman's translation from Epistemic into Intuitionistic logic, which was proposed as the basis of a comprehensive proof method for the faithfulness of the Goodel translation. We focus on the propositional case and raise the issue of the admissibility of the translated necessitation rule. Then, we contribute to Flagg and Friedman's program by giving an explicit proof of the soundness of their translation. #### 1-perfectly orientable graphs and graph products. (arXiv:1511.07314v1 [math.CO]) A graph G is said to be 1-perfectly orientable (1-p.o. for short) if it admits an orientation such that the out-neighborhood of every vertex is a clique in G. The class of 1-p.o. graphs forms a common generalization of the classes of chordal and circular arc graphs. Even though 1-p.o. graphs can be recognized in polynomial time, no structural characterization of 1-p.o. graphs is known. In this paper we consider the four standard graph products: the Cartesian product, the strong product, the direct product, and the lexicographic product. For each of them, we characterize when a nontrivial product of two graphs is 1-p.o. #### Adapting the serial Alpgen event generator to simulate LHC collisions on millions of parallel threads. (arXiv:1511.07312v1 [hep-ph]) As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved. #### Local Reasoning with First-Class Heaps, and a New Frame Rule. (arXiv:1511.07267v1 [cs.PL]) Separation Logic (SL) brought an advance to program verification of data structures by interpreting (recursively defined) predicates as implicit heaps, and using a separating conjoin operator to construct heaps from disjoint subheaps. While the Frame Rule of SL facilitated local reasoning in program fragments, its restriction to disjoint subheaps means that any form of sharing between predicates is problematic. With this as background motivation, we begin with an assertion language in which subheaps may be explicitly defined within predicates, and the effect of separation obtained by specifying that certain heaps are disjoint. The strength of this base language is not just its expressiveness, but it is amenable to symbolic execution and therefore automatic program verification. In this paper, we extend this base language with a new frame rule to accommodate subheaps and non-separating conjoining of subheaps so as to provide compositional reasoning. This significantly extends both the expressiveness and automatability of the base language. Finally we demonstrate our framework to automatically prove two significant example programs, one concerning a summary of a program fragments, and one exhibiting structure sharing in data structures. #### Ridge Leverage Scores for Low-Rank Approximation. (arXiv:1511.07263v1 [cs.DS]) Often used as importance sampling probabilities, leverage scores have become indispensable in randomized algorithms for linear algebra, optimization, graph theory, and machine learning. A major body of work seeks to adapt these scores to low-rank approximation problems. However, existing "low-rank leverage scores" can be difficult to compute, often work for just a single application, and are sensitive to matrix perturbations. We show how to avoid these issues by exploiting connections between low-rank approximation and regularization. Specifically, we employ ridge leverage scores, which are simply standard leverage scores computed with respect to an$\ell_2$regularized input. Importance sampling by these scores gives the first unified solution to two of the most important low-rank sampling problems:$(1+\epsilon)$error column subset selection and$(1+\epsilon)$error projection-cost preservation. Moreover, ridge leverage scores satisfy a key monotonicity property that does not hold for any prior low-rank leverage scores. Their resulting robustness leads to two sought-after results in randomized linear algebra. 1) We give the first input-sparsity time low-rank approximation algorithm based on iterative column sampling, resolving an open question posed in [LMP13], [CLM+15], and [AM15]. 2) We give the first single-pass streaming column subset selection algorithm whose real-number space complexity has no dependence on stream length. #### A Python Extension for the Massively Parallel Multiphysics Simulation Framework waLBerla. (arXiv:1511.07261v1 [cs.DC]) We present a Python extension to the massively parallel HPC simulation toolkit waLBerla. waLBerla is a framework for stencil based algorithms operating on block-structured grids, with the main application field being fluid simulations in complex geometries using the lattice Boltzmann method. Careful performance engineering results in excellent node performance and good scalability to over 400,000 cores. To increase the usability and flexibility of the framework, a Python interface was developed. Python extensions are used at all stages of the simulation pipeline: They simplify and automate scenario setup, evaluation, and plotting. We show how our Python interface outperforms the existing text-file-based configuration mechanism, providing features like automatic nondimensionalization of physical quantities and handling of complex parameter dependencies. Furthermore, Python is used to process and evaluate results while the simulation is running, leading to smaller output files and the possibility to adjust parameters dependent on the current simulation state. C++ data structures are exported such that a seamless interfacing to other numerical Python libraries is possible. The expressive power of Python and the performance of C++ make development of efficient code with low time effort possible. #### On the total$(k,r)$-domination number of random graphs. (arXiv:1511.07249v1 [cs.DM]) A subset$S$of a vertex set of a graph$G$is a total$(k,r)$-dominating set if every vertex$u \in V(G)$is within distance$k$of at least$r$vertices in$S$. The minimum cardinality among all total$(k,r)$-dominating sets of$G$is called the total$(k,r)$-domination number of$G$, denoted by$\gamma^{t}_{(k,r)}(G)$. We previously gave an upper bound on$\gamma^{t}_{(2,r)}(G(n,p))$in random graphs with non-fixed$p \in (0,1)$. In this paper we generalize this result to give an upper bound on$\gamma^{t}_{(k,r)}(G(n,p))$in random graphs with non-fixed$p \in (0,1)$for$k\geq 3$as well as present an upper bound on$\gamma^{t}_{(k,r)}(G)$in graphs with large girth. #### Improving the performance of the linear systems solvers using CUDA. (arXiv:1511.07207v1 [cs.DC]) Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are several approaches to GPU programming: CUDA from NVIDIA and Stream from AMD. CUDA is now a popular programming model for general purpose computations on GPU for C/C++ programmers. A great number of applications were ported to CUDA programming model and they obtain speedups of orders of magnitude comparing to optimized CPU implementations. In this paper we present an implementation of a library for solving linear systems using the CCUDA framework. We present the results of performance tests and show that using GPU one can obtain speedups of about of approximately 80 times comparing with a CPU implementation. #### Medusa: An Efficient Cloud Fault-Tolerant MapReduce. (arXiv:1511.07185v1 [cs.DC]) Applications such as web search and social networking have been moving from centralized to decentralized cloud architectures to improve their scalability. MapReduce, a programming framework for processing large amounts of data using thousands of machines in a single cloud, also needs to be scaled out to multiple clouds to adapt to this evolution. The challenge of building a multi-cloud distributed architecture is substantial. Notwithstanding, the ability to deal with the new types of faults introduced by such setting, such as the outage of a whole datacenter or an arbitrary fault caused by a malicious cloud insider, increases the endeavor considerably. In this paper we propose Medusa, a platform that allows MapReduce computations to scale out to multiple clouds and tolerate several types of faults. Our solution fulfills four objectives. First, it is transparent to the user, who writes her typical MapReduce application without modification. Second, it does not require any modification to the widely used Hadoop framework. Third, the proposed system goes well beyond the fault-tolerance offered by MapReduce to tolerate arbitrary faults, cloud outages, and even malicious faults caused by corrupt cloud insiders. Fourth, it achieves this increased level of fault tolerance at reasonable cost. We performed an extensive experimental evaluation in the ExoGENI testbed, demonstrating that our solution significantly reduces execution time when compared to traditional methods that achieve the same level of resilience. #### Longest Gapped Repeats and Palindromes. (arXiv:1511.07180v1 [cs.DS]) A gapped repeat (respectively, palindrome) occurring in a word$w$is a factor$uvu$(respectively,$u^Rvu$) of$w$. In such a repeat (palindrome)$u$is called the arm of the repeat (respectively, palindrome), while$v$is called the gap. We show how to compute efficiently, for every position$i$of the word$w$, the longest gapped repeat and palindrome occurring at that position, provided that the length of the gap is subject to various types of restrictions. That is, that for each position$i$we compute the longest prefix$u$of$w[i..n]$such that$uv$(respectively,$u^Rv$) is a suffix of$w[1..i-1]$(defining thus a gapped repeat$uvu$- respectively, palindrome$u^Rvu$), and the length of$v$is subject to the aforementioned restrictions. #### Developing a High Performance Software Library with MPI and CUDA for Matrix Computations. (arXiv:1511.07174v1 [cs.DC]) Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a solution for very large applications. We considered a heterogeneous cluster that combines the CPU and GPU computations using MPI and CUDA for developing a high performance linear algebra library. Our library deals with large linear systems solvers because they are a common problem in the fields of science and engineering. Direct methods for computing the solution of such systems can be very expensive due to high memory requirements and computational cost. An efficient alternative are iterative methods which computes only an approximation of the solution. In this paper we present an implementation of a library that uses a hybrid model of computation using MPI and CUDA implementing both direct and iterative linear systems solvers. Our library implements LU and Cholesky factorization based solvers and some of the non-stationary iterative methods using the MPI/CUDA combination. We compared the performance of our MPI/CUDA implementation with classic programs written to be run on a single CPU. #### Optimizing Solution Quality in Synchronization Synthesis. (arXiv:1511.07163v1 [cs.PL]) Given a multithreaded program written assuming a friendly, non-preemptive scheduler, the goal of synchronization synthesis is to automatically insert synchronization primitives to ensure that the modified program behaves correctly, even with a preemptive scheduler. In this work, we focus on the quality of the synthesized solution: we aim to infer synchronization placements that not only ensure correctness, but also meet some quantitative objectives such as optimal program performance on a given computing platform. The key step that enables solution optimization is the construction of a set of global constraints over synchronization placements such that each model of the constraints set corresponds to a correctness-ensuring synchronization placement. We extract the global constraints from generalizations of counterexample traces and the control-flow graph of the program. The global constraints enable us to choose from among the encoded synchronization solutions using an objective function. We consider two types of objective functions: ones that are solely dependent on the program (e.g., minimizing the size of critical sections) and ones that are also dependent on the computing platform. For the latter, given a program and a computing platform, we construct a performance model based on measuring average contention for critical sections and the average time taken to acquire and release a lock under a given average contention. We empirically evaluated that our approach scales to typical module sizes of many real world concurrent programs such as device drivers and multithreaded servers, and that the performance predictions match reality. To the best of our knowledge, this is the first comprehensive approach for optimizing the placement of synthesized synchronization. #### NearBucket-LSH: Efficient Similarity Search in P2P Networks. (arXiv:1511.07148v1 [cs.DC]) We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucket-LSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucket-LSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than 50%. #### A PAC Approach to Application-Specific Algorithm Selection. (arXiv:1511.07147v1 [cs.LG]) The best algorithm for a computational problem generally depends on the "relevant inputs," a concept that depends on the application domain and often defies formal articulation. While there is a large literature on empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly little theoretical analysis of the problem. This paper adapts concepts from statistical and online learning theory to reason about application-specific algorithm selection. Our models capture several state-of-the-art empirical and theoretical approaches to the problem, ranging from self-improving algorithms to empirical performance models, and our results identify conditions under which these approaches are guaranteed to perform well. We present one framework that models algorithm selection as a statistical learning problem, and our work here shows that dimension notions from statistical learning theory, historically used to measure the complexity of classes of binary- and real-valued functions, are relevant in a much broader algorithmic context. We also study the online version of the algorithm selection problem, and give possibility and impossibility results for the existence of no-regret learning algorithms. #### Identity Testing and Lower Bounds for Read-$k$Oblivious Algebraic Branching Programs. (arXiv:1511.07136v1 [cs.CC]) Read-$k$oblivious algebraic branching programs are a natural generalization of the well-studied model of read-once oblivious algebraic branching program (ROABPs). In this work, we give an exponential lower bound of$\exp(n/k^{O(k)})$on the width of any read-$k$oblivious ABP computing some explicit multilinear polynomial$f$that is computed by a polynomial size depth-$3$circuit. We also study the polynomial identity testing (PIT) problem for this model and obtain a white-box subexponential-time PIT algorithm. The algorithm runs in time$2^{\tilde{O}(n^{1-1/2^{k-1}})}$and needs white box access only to know the order in which the variables appear in the ABP. #### A reduction of the logspace shortest path problem to biconnected graphs. (arXiv:1511.07100v1 [cs.CC]) In this paper, we reduce the logspace shortest path problem to biconnected graphs; in particular, we present a logspace shortest path algorithm for general graphs which uses a logspace shortest path oracle for biconnected graphs. We also present a linear time logspace shortest path algorithm for graphs with bounded vertex degree and biconnected component size, which does not rely on an oracle. The asymptotic time-space product of this algorithm is the best possible among all shortest path algorithms. #### Designing a mobile game to thwarts malicious IT threats: A phishing threat avoidance perspective. (arXiv:1511.07093v1 [cs.CY]) Phishing is an online identity theft, which aims to steal sensitive information such as username, password and online banking details from victims. To prevent this, phishing education needs to be considered. Game based education is becoming more and more popular. This paper introduces a mobile game prototype for the android platform based on a story, which simplifies and exaggerates real life. The elements of a game design framework for avoiding phishing attacks were used to address the game design issues and game design principles were used as a set of guidelines for structuring and presenting information. The overall mobile game design was aimed to enhance the user's avoidance behaviour through motivation to protect themselves against phishing threats. The prototype mobile game design was presented on MIT App Inventor Emulator. #### Max-sum diversity via convex programming. (arXiv:1511.07077v1 [cs.DS]) Diversity maximization is an important concept in information retrieval, computational geometry and operations research. Usually, it is a variant of the following problem: Given a ground set, constraints, and a function$f(\cdot)$that measures diversity of a subset, the task is to select a feasible subset$S$such that$f(S)$is maximized. The \emph{sum-dispersion} function$f(S) = \sum_{x,y \in S} d(x,y)$, which is the sum of the pairwise distances in$S$, is in this context a prominent diversification measure. The corresponding diversity maximization is the \emph{max-sum} or \emph{sum-sum diversification}. Many recent results deal with the design of constant-factor approximation algorithms of diversification problems involving sum-dispersion function under a matroid constraint. In this paper, we present a PTAS for the max-sum diversification problem under a matroid constraint for distances$d(\cdot,\cdot)$of \emph{negative type}. Distances of negative type are, for example, metric distances stemming from the$\ell_2$and$\ell_1$norm, as well as the cosine or spherical, or Jaccard distance which are popular similarity metrics in web and image search. #### Which Regular Expression Patterns are Hard to Match?. (arXiv:1511.07070v1 [cs.CC]) Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. In particular, regular expression matching is a widely used computational primitive, employed in many programming languages and text processing utilities. A classic algorithm for regular expression matching runs in$O(m n)$time (where$m$is the length of the pattern and$n$is the length of the text). This running time can be improved by a poly-logarithmic factor, but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its depth (when interpreted as a formula). Very roughly, our results state that for expressions involving concatenation, OR and Kleene plus, the following dichotomy holds: * Matching regular expressions of depth two (involving any combination of the above operators) can be solved in near-linear time. In particular, this case covers the aforementioned variants of regular expression matching amenable to fast algorithms. * Matching regular expressions of depth three (involving any combination of the above operators) that are not reducible to some depth-two expressions cannot be solved in sub-quadratic time unless the Strong Exponential Time Hypothesis (SETH) is false. For expressions involving concatenation, OR and Kleene star our results are similar, with one notable exception: we show that pattern matching with depth two regular expressions that are concatenations of Kleene stars is SETH-hard. Otherwise the results are the same as described above, but with Kleene plus replaced by Kleene star. #### Constant Factor Approximation for ATSP with Two Edge Weights. (arXiv:1511.07038v1 [cs.DS]) We give a constant factor approximation algorithm for the Asymmetric Traveling Salesman Problem on shortest path metrics of directed graphs with two different edge weights. For the case of unit edge weights, the first constant factor approximation was given recently in [Sve15]. This was accomplished by introducing an easier problem called Local-Connectivity ATSP and showing that a good solution to this problem can be used to obtain a constant factor approximation for ATSP. In this paper, we solve Local-Connectivity ATSP for two different edge weights. The solution is based on a flow decomposition theorem for solutions of the Held-Karp relaxation, which may be of independent interest. #### Occurrence Typing Modulo Theories. (arXiv:1511.07033v1 [cs.PL]) We present a new type system combining occurrence typing, previously used to type check programs in dynamically-typed languages such as Racket, JavaScript, and Ruby, with dependent refinement types. We demonstrate that the addition of refinement types allows the integration of arbitrary solver-backed reasoning about logical propositions from external theories. By building on occurrence typing, we can add our enriched type system as an extension of Typed Racket---adding dependency and refinement reuses the existing formalism while increasing its expressiveness. Dependent refinement types allow Typed Racket programmers to express rich type relationships, ranging from data structure invariants such as red-black tree balance to preconditions such as vector bounds. Refinements allow programmers to embed the propositions that occurrence typing in Typed Racket already reasons about into their types. Further, extending occurrence typing to refinements allows us to make the underlying formalism simpler and more powerful. In addition to presenting the design of our system, we present a formal model of the system, show how to integrate it with theories over both linear arithmetic and bitvectors, and evaluate the system in the context of the full Typed Racket implementation. Specifically, we take safe vector access as a case study, and examine all vector accesses in a 56,000 line corpus of Typed Racket programs. Our system is able to prove that 50% of these are safe with no new annotation, and with a few annotations and modifications, we can capture close to 80%. #### On a Natural Dynamics for Linear Programming. (arXiv:1511.07020v1 [cs.DS]) In this paper we study dynamics inspired by Physarum polycephalum (a slime mold) for solving linear programs [NTY00, IJNT11, JZ12]. These dynamics are arrived at by a local and mechanistic interpretation of the inner workings of the slime mold and a global optimization perspective has been lacking even in the simplest of instances. Our first result is an interpretation of the dynamics as an optimization process. We show that Physarum dynamics can be seen as a steepest-descent type algorithm on a certain Riemannian manifold. Moreover, we prove that the trajectories of Physarum are in fact paths of optimizers to a parametrized family of convex programs, in which the objective is a linear cost function regularized by an entropy barrier. Subsequently, we rigorously establish several important properties of solution curves of Physarum. We prove global existence of such solutions and show that they have limits, being optimal solutions of the underlying LP. Finally, we show that the discretization of the Physarum dynamics is efficient for a class of linear programs, which include unimodular constraint matrices. Thus, together, our results shed some light on how nature might be solving instances of perhaps the most complex problem in P: linear programming. #### Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster. (arXiv:1511.07017v1 [cs.DC]) Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst. #### Generating Configurable Hardware from Parallel Patterns. (arXiv:1511.06968v1 [cs.DC]) In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements compared to CPUs for a wide class of applications and are far more flexible than fixed-function ASICs. However, FPGAs are difficult to program. Traditional programming models for reconfigurable logic use low-level hardware description languages like Verilog and VHDL, which have none of the pro- ductivity features of modern software development languages but produce very efficient designs, and low-level software lan- guages like C and OpenCL coupled with high-level synthesis (HLS) tools that typically produce designs that are far less efficient. Functional languages with parallel patterns are a better fit for hardware generation because they both provide high-level abstractions to programmers with little experience in hard- ware design and avoid many of the problems faced when gen- erating hardware from imperative languages. In this paper, we identify two optimizations that are important when using par- allel patterns to generate hardware: tiling and metapipelining. We present a general representation of tiled parallel patterns, and provide rules for automatically tiling patterns and gen- erating metapipelines. We demonstrate experimentally that these optimizations result in speedups up to 40x on a set of benchmarks from the data analytics domain. #### Constructive Galois Connections: Taming the Galois Connection Framework for Mechanized Metatheory. (arXiv:1511.06965v1 [cs.PL]) Galois connections are a foundational tool for structuring abstraction in semantics and their use lies at the heart of the theory of abstract interpretation. Yet, mechanization of Galois connections has remained limited to certain restricted modes of use, preventing their fully general application in mechanized metatheory and certified programming. This paper presents constructive Galois connections, a framework for Galois connections that is effective both on paper and in proof assistants; is complete with respect to the set of Galois connections with computational content; and enables more general reasoning principles, including the "calculational" style advocated by Cousot. Crucial to our technical approach is the addition of monadic structure to Galois connections to control a "specification effect." Effectful calculations may reason classically, while pure calculations have extractable computational content. Explicitly moving between the worlds of specification and implementation is enabled by our metatheory. To validate our approach, we provide two case studies in mechanizing existing proofs from the literature: one uses calculational abstract interpretation to design a static analyzer, the other forms a semantic basis for gradual typing. Both mechanized proofs closely follow their original paper-and-pencil counterparts, employ reasoning principles not captured by previous mechanization approaches, support the extraction of verified algorithms, and are novel. #### Budgetary Effects on Pricing Equilibrium in Online Markets. (arXiv:1511.06954v1 [cs.GT]) Following the work of Babaioff et al~\cite{BNL14}, we consider the pricing game with strategic vendors and a single buyer, modeling a scenario in which multiple competing vendors have very good knowledge of a buyer, as is common in online markets. We add to this model the realistic assumption that the buyer has a fixed budget and does not have unlimited funds. When the buyer's valuation function is additive, we are able to completely characterize the different possible pure Nash Equilibria (PNE) and in particular obtain a necessary and sufficient condition for uniqueness. Furthermore, we characterize the market clearing (or Walresian) equilibria for all submodular valuations. Surprisingly, for certain monotone submodular function valuations, we show that the pure NE can exhibit some counterintuitive phenomena; namely, there is a valuation such that the pricing will be market clearing and within budget if the buyer does not reveal the budget but will result in a smaller set of allocated items (and higher prices for items) if the buyer does reveal the budget. It is also the case that the conditions that guarantee market clearing in Babaioff et al~\cite{BNL14} for submodular functions are not necessarily market clearing when there is a budget. Furthermore, with respect to social welfare, while without budgets all equilibria are optimal (i.e. POA = POS = 1), we show that with budgets the worst equilibrium may only achieve$\frac{1}{n-2}$of the best equilibrium. #### Ironing in the Dark. (arXiv:1511.06918v1 [cs.GT]) This paper presents the first polynomial-time algorithm for position and matroid auction environments that learns, from samples from an unknown bounded valuation distribution, an auction with expected revenue arbitrarily close to the maximum possible. In contrast to most previous work, our results apply to arbitrary (not necessarily regular) distributions and the strongest possible benchmark, the Myerson-optimal auction. Learning a near-optimal auction for an irregular distribution is technically challenging because it requires learning the appropriate "ironed intervals," a delicate global property of the distribution. #### Countering Social Engineering through Social Media: An Enterprise Security Perspective. (arXiv:1511.06915v1 [cs.CY]) The increasing threat of social engineers targeting social media channels to advance their attack effectiveness on company data has seen many organizations introducing initiatives to better understand these vulnerabilities. This paper examines concerns of social engineering through social media within the enterprise and explores countermeasures undertaken to stem ensuing risk. Also included is an analysis of existing social media security policies and guidelines within the public and private sectors. #### A Simple Algorithm For Replacement Paths Problem. (arXiv:1511.06905v1 [cs.DS]) Let G=(V,E)(|V|=n and |E|=m) be an undirected graph with positive edge weights. Let P_{G}(s, t) be a shortest s-t path in G. Let l be the number of edges in P_{G}(s, t). The \emph{Edge Replacement Path} problem is to compute a shortest s-t path in G\{e}, for every edge e in P_{G}(s, t). The \emph{Node Replacement Path} problem is to compute a shortest s-t path in G\{v}, for every vertex v in P_{G}(s, t). In this paper we present an O(T_{SPT}(G)+m+l^2) time and O(m+l^2) space algorithm for both the problems. Where, T_{SPT}(G) is the asymptotic time to compute a single source shortest path tree in G. The proposed algorithm is simple and easy to implement. #### Quantum approach to Bertrand duopoly. (arXiv:1511.06892v1 [cs.GT]) The aim of the paper is to study the Bertrand duopoly example in the quantum domain. We use two ways to write the game in terms of quantum theory. The first one adapts the Li-Du-Massar scheme for the Cournot duopoly. The second one is a simplified model that exploits a two qubit entangled state. In both cases we focus on finding Nash equilibria in the resulting games. #### Network Topology Adaptation and Interference Coordination for Energy Saving in Heterogeneous Networks. (arXiv:1511.06888v1 [cs.NI]) Interference coupling in heterogeneous networks introduces the inherent non-convexity to the network resource optimization problem, hindering the development of effective solutions. A new framework based on multi-pattern formulation has been proposed in this paper to study the energy efficient strategy for joint cell activation, user association and multicell multiuser channel allocation. One key feature of this interference pattern formulation is that the patterns remain fixed and independent of the optimization process. This creates a favorable opportunity for a linear programming formulation while still taking interference coupling into account. A tailored algorithm is developed to solve the formulated network energy saving problem in the dual domain by exploiting the problem structure, which gives a significant complexity saving compared to using standard solvers. Numerical results show a huge improvement in energy saving achieved by the proposed scheme. #### Patterns of trading profiles at the Nordic Stock Exchange. A correlation-based approach. (arXiv:1511.06873v1 [q-fin.TR]) We investigate the trading behavior of Finnish individual investors trading the stocks selected to compute the OMXH25 index in 2003 by tracking the individual daily investment decisions. We verify that the set of investors is a highly heterogeneous system under many aspects. We introduce a correlation based method that is able to detect a hierarchical structure of the trading profiles of heterogeneous individual investors. We verify that the detected hierarchical structure is highly overlapping with the cluster structure obtained with the approach of statistically validated networks when an appropriate threshold of the hierarchical trees is used. We also show that the combination of the correlation based method and of the statistically validated method provides a way to expand the information about the clusters of investors with similar trading profiles in a robust and reliable way. #### EMinRET: Heuristic for Energy-Aware VM Placement with Fixed Intervals and Non-preemption. (arXiv:1511.06825v1 [cs.NI]) Infrastructure-as-a-Service (IaaS) clouds have become more popular enabling users to run applications under virtual machines. This paper investigates the energy-aware virtual machine (VM) allocation problems in IaaS clouds along characteristics: multiple resources, and fixed interval times and non-preemption of virtual machines. Many previous works proposed to use a minimum number of physical machines, however, this is not necessarily a good solution to minimize total energy consumption in the VM placement with multiple resources, fixed interval times and non-preemption. We observed that minimizing total energy consumption of physical machines is equivalent to minimize the sum of total completion time of all physical machines. Based on the observation, we propose EMinRET algorithm. The EMinRET algorithm swaps an allocating VM with a suitable overlapped VM, which is of the same VM type and is allocated on the same physical machine, to minimize total completion time of all physical machines. The EMinRET uses resource utilization during executing time period of a physical machine as the evaluation metric, and will then choose a host that minimizes the metric to allocate a new VM. In addition, this work studies some heuristics for sorting the list of virtual machines (e.g., sorting by the earliest starting time, or the longest duration time first, etc.) to allocate VM. Using the realistic log-trace in the Parallel Workloads Archive, our simulation results show that the EMinRET algorithm could reduce from 25% to 45% energy consumption compared with power-aware best-fit decreasing (PABFD)) and vector bin-packing norm-based greedy algorithms. Moreover, the EMinRET heuristic has also less total energy consumption than our previous heuristics (e.g. MinDFT and EPOBF) in the simulations (using same virtual machines sorting method). #### Key Exchange Trust Evaluation in Peer-to-Peer Sensor Networks with Unconditionally Secure Key Exchange. (arXiv:1511.06795v1 [cs.CR]) As the utilization of sensor networks continue to increase, the importance of security becomes more profound. Many industries depend on sensor networks for critical tasks, and a malicious entity can potentially cause catastrophic damage. We propose a new key exchange trust evaluation for peer-to-peer sensor networks, where part of the network has unconditionally secure key exchange. For a given sensor, the higher the portion of channels with unconditionally secure key exchange the higher the trust value. We give a brief introduction to unconditionally secured key exchange concepts and mention current trust measures in sensor networks. We demonstrate the new key exchange trust measure on a hypothetical sensor network using both wired and wireless communication channels. #### Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. (arXiv:1511.06773v1 [cs.DS]) Consider the following Online Boolean Matrix-Vector Multiplication problem: We are given an$n\times n$matrix$M$and will receive$n$column-vectors of size$n$, denoted by$v_1,\ldots,v_n$, one by one. After seeing each vector$v_i$, we have to output the product$Mv_i$before we can see the next vector. A naive algorithm can solve this problem using$O(n^3)$time in total, and its running time can be slightly improved to$O(n^3/\log^2 n)$[Williams SODA'07]. We show that a conjecture that there is no truly subcubic ($O(n^{3-\epsilon})$) time algorithm for this problem can be used to exhibit the underlying polynomial time hardness shared by many dynamic problems. For a number of problems, such as subgraph connectivity, Pagh's problem,$d$-failure connectivity, decremental single-source shortest paths, and decremental transitive closure, this conjecture implies tight hardness results. Thus, proving or disproving this conjecture will be very interesting as it will either imply several tight unconditional lower bounds or break through a common barrier that blocks progress with these problems. This conjecture might also be considered as strong evidence against any further improvement for these problems since refuting it will imply a major breakthrough for combinatorial Boolean matrix multiplication and other long-standing problems if the term "combinatorial algorithms" is interpreted as "non-Strassen-like algorithms" [Ballard et al. SPAA'11]. The conjecture also leads to hardness results for problems that were previously based on diverse problems and conjectures, such as 3SUM, combinatorial Boolean matrix multiplication, triangle detection, and multiphase, thus providing a uniform way to prove polynomial hardness results for dynamic algorithms; some of the new proofs are also simpler or even become trivial. The conjecture also leads to stronger and new, non-trivial, hardness results. ### QuantOverflow #### Computing$\gamma$and$\mu$at the efficient frontier Consider the condition which the weights of any portfolio belonging to the efficient frontier satisfy: $$\gamma\boldsymbol{wC} = \boldsymbol{m} - \mu\boldsymbol{u}$$ Assuming we have three securities$\boldsymbol{w}= [w_1,w_2,w_3],m=[\mu_1,\mu_2,\mu_3],u =[1,1,1], \gamma =\dfrac{u_V-\mu}{\sigma^2_v}$.$\boldsymbol{C}$is the covariance matrix,$\mu_V$is the expected return of this portfolio,$w_i$is the weight and$\mu_i$is the expected return on security$i$. What I need to do is to compute the values of$\gamma$and$\mu$such that the weights$\boldsymbol{w}$satisfy$\gamma\boldsymbol{wC} = \boldsymbol{m} - \mu\boldsymbol{u}$Assume that we have all values except$\mu$and$\gamma$The way to do this,according to my book, is by first multiply this condition(equality) by$\boldsymbol{C}^{-1}\boldsymbol{u}^T$and, respectively,$\boldsymbol{C}^{-1}\boldsymbol{m}^t$so that we get: $$\mu_V(\boldsymbol{m} - \mu\boldsymbol{u})\boldsymbol{C}^{-1}\boldsymbol{u}^T = (\boldsymbol{m}-\mu \boldsymbol{u})\boldsymbol{C}^{-1}m^T$$ (since$\boldsymbol{w}\boldsymbol{u}^{T}$= 1 and$\boldsymbol{w}\boldsymbol{m}^T = \mu_V$, also if it is of any use$\sigma^2_V = \boldsymbol{w}\boldsymbol{C}\boldsymbol{w}^T$) However I don't see how to obtain this equality the way the book describes:$\gamma\boldsymbol{wC} (\boldsymbol{C}^{-1}\boldsymbol{u}^T)(\boldsymbol{C}^{-1}\boldsymbol{m}^T) = (\boldsymbol{m} - \mu\boldsymbol{u}) (\boldsymbol{C}^{-1}\boldsymbol{u}^T)(\boldsymbol{C}^{-1}\boldsymbol{m}^T)$On the left hand side I get$\gamma (\boldsymbol{C}^{-1}\boldsymbol{m}^T) $.Anyhow I don't see how to get this to the equation above(how the book did it).Could someone show me how they did it or what Iam missing. ...The next step would be to solve for$\mu$and this is as follows $$\mu= \dfrac{\boldsymbol{m} \boldsymbol{C}^{-1}( \boldsymbol{m}^T - \mu_V \boldsymbol{u}^T) }{\boldsymbol{u} \boldsymbol{C}^{-1}( \boldsymbol{m}^T - \mu_V \boldsymbol{u}^T)}$$ This is obtained from: $$\mu_V(\boldsymbol{m} - \mu\boldsymbol{u})\boldsymbol{C}^{-1}\boldsymbol{u}^T = (\boldsymbol{m}-\mu \boldsymbol{u})\boldsymbol{C}^{-1}m^T$$ Once again I don't know how they did it. Could someone show the steps in more detail?? ### StackOverflow #### Using machine learning to parse complicated strings containing chemical data? I have very poor (dirty?) chemical information data, that has the following format: ID Chemicals 1701 3 Tanks - 1 - Benzoyl Chloride and 2 - Benzoflex 1840 Two 520 Class IIIB inside and Two 16,800 Condensate tanks 1840 Two 520 Class IIIB inside and Two 16,800 Condensate tanks 1938 2 tanks - 1,100 gallons diesel & 1,100 gallons gasoline 1888 4 tanks - 3 - 20,000 gallon and 1 - 10,000 gallon Gas, Diesel and K-1  I need to parse this data to search for recognizable chemicals in each super-string. After parsing this data, I can search through commonly available chemistry databases for the subsets to return hits (of different quality) for each subset. The main problem is that I don't know how to begin parsing this data in an efficient and structured manner. There are several ideas I am toying with: 1. Parse each superstring into all combinations of substrings, using whitespaces as a delimiter for the substrings, and then search for all combinations of the substrings. 2. Do the above, but only after removing key words I know not to be useful (tanks, class, inside, etc.) 3. Use a machine learning algorithm with supervised learning to parse the data - with the supervised learning being feedback from me on whether the parsed data was useful in supplying a useful match from the external chemistry database Right now I am attempting method 2, after implementing method 1 with horrible results, but I am finding that building and maintaining the list of 'keys' to ignore is proving far too cumbersome. If I wish to follow option three, what python machine learning libraries can provide this capability? #### Polymorphic function over types combined by typeclass Consider such domain logic: three types of users: Civilians, ServiceMembers and Veterans. Each of them has 'name', stored in different attributes. Task is to write a function, accepting each of the types and returning 'C' char for Civilians, 'V' char for Veterans and 'S' char for ServiceMembers. I have such record declarations: data ServiceMemberInfo = ServiceMemberInfo { smname::String } data VeteranInfo = VeteranInfo { vname::String } data CivilianInfo = CivilianInfo { cname::String }  My first idea was to combine them by such typeclass: class UserLetter a where userLetter :: a -> Char  And implement instances: instance UserLetter ServiceMemberInfo where userLetter _ = 'S' instance UserLetter VeteranInfo where userLetter _ = 'V' instance UserLetter CivilianInfo where userLetter _ = 'C'  In this case, userLetter is a function I wanted. But I really would like to write something like that (without typeclasses) userLetter1 :: UserLetter a => a -> Char userLetter1 (CivilianInfo _) = 'C' userLetter1 (ServiceMemberInfo _) = 'S' userLetter1 (VeteranInfo _) = 'V'  which throws compilation error: 'a' is a rigid type variable bound by Another way is to use ADT: data UserInfo = ServiceMemberInfo { smname::String } | VeteranInfo { vname::String } | CivilianInfo { cname::String }  Then userLetter1 declaration becomes obvious: userLetter1 :: UserInfo -> Char userLetter1 (CivilianInfo _) = 'C' userLetter1 (ServiceMemberInfo _) = 'S' userLetter1 (VeteranInfo _) = 'V'  But, lets say, I don't have control over ServiceMemberInfo (and others) declarations. How userLetter1 can be defined? Is there a way to declare one ADT with existing ServiceMemberInfo (and others) types? ### Lobsters #### doitlive - fake live terminal demonstrations This program lets you generate terminal sessions that appear to be live as you bash random keys on the keyboard. Comments #### Fail at Scale ### HN Daily #### Daily Hacker News for 2015-11-23 ## November 23, 2015 ### QuantOverflow #### Bond portfolio hedging against currency risk How do I hedge a bond portfolio against currency risk? Ideally I'm looking for books or other references on this topic. ### TheoryOverflow #### How to show a language is regular? Let$\sum_{2} = \{\begin{bmatrix} 0 \\ 0 \end{bmatrix}\begin{bmatrix} 0 \\ 1\end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} \begin{bmatrix}\ 1 \\ 1 \end{bmatrix} \}\sum_{2}$contains all columns of$0$s and$1$s of height two. A string of symbols in$\sum_{2}$gives two rows of$0$'s and$1$'s. Consider each row to be a binary number and let$L = \{w \in \sum_{2}^* \text{the bottom row of w is three times the top row}\}.$For example,$\sum_{2} =\{\begin{bmatrix} 0 \\ 0 \end{bmatrix}\begin{bmatrix} 0 \\ 1\end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} \begin{bmatrix}\ 1 \\ 1 \end{bmatrix}\} \in L, \text{but} \sum_{2} = \{\begin{bmatrix} 0 \\ 1 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} \notin L \}. $Show that L is regular. I am trying to solve this question a while, however, I cannot find any convenient solution. Would you like to help me? ### QuantOverflow #### Bond Prices in terms of short and forward rates Of course, a pure discount bond price$P(t,T)$may be stated in terms of its yield$R(t,T)$as $$P(t,T) = e^{-R(t,T)(T-t)}.$$ Let's assume both the (instantaneous) short rate$r(t)$and (instantaneous) forward rate$f(t,T)are deterministic functions. The relationships to discount bond prices are \begin{align} r(t) & = -\frac{\partial}{\partial T} \log P(t,t), \\ f(t,T) & = -\frac{\partial}{\partial T} \log P(t,T). \end{align} From this, it is clear that $$P(t,T) = \exp\left(-\int_t^T f(t,u) \, du\right) \qquad (1).$$ On the other hand, the bond price is often stated in terms of risk-neutral expectations using the short rate, such as $$P(t,T) = E_Q\left(\exp\left(-\int_t^T r(u) \, du\right) \mid \mathcal{F}_t\right),$$ and since I am assumingr(t)$is deterministic, it should be that $$P(t,T) = \exp\left(-\int_t^T r(u) \, du\right) \qquad (2).$$ Comparing Eqns (1) and (2), it seems like $$\int_t^T r(u) \, du = \int_t^T f(t,u) \, du.$$ Does this even make sense? Furthermore, since$P(t,T) = e^{-R(t,T)(T,t)}$, we would get $$R(t,T) = \frac{1}{T-t}\int_t^T f(t,u) \, du = \frac{1}{T-t}\int_t^T r(u) \, du.$$ That is, the yield is both the average of the instantaneous forward rate (this is true), and the average of the instantaneous spot rate. Is this latter statement true? ### CompsciOverflow #### Help with proof involving weighted full binary tree Given a full binary tree,$T$(each node is either a leaf or possesses exactly two children), with$n$leaf nodes:$v_1,v_2,...,v_n$, and weights associated with the leaf nodes:$w_1,w_2,...,w_n$, the cost of the tree, denoted by$|T|$, is defined as: $$|T|=\sum_{i=1}^{n}w_il_i$$ Where$l_i$is the depth of$v_i$in$T$. Example: The cost of the above tree is:$0.25\times 4+ 0.18\times 4+0.5\times 3+0.4\times 2+0.31\times 3+0.2\times 4+0.1\times 4+0.65\times 3+0.2\times 3=8.7$Now, I need to prove two things. I'll start with the first. 1. Let$T$be a full binary tree with$n$terminal (leaf) nodes, then$T$has$n-1$internal nodes. The proof for this is not too difficult. Using induction on the number of terminal nodes, I can easily prove it. The second is not so trivial, and this is where I need help. Before I'll move on to what I'm trying to prove, I'll introduce another concept. Let's associate now weights with internal nodes: the weight of an internal node is the sum of the weights of its two children. Example: Now for the shocker: 1. Given a full binary tree, the sum of the$n-1$internal nodes is the cost of the tree. This looks surprising at first, but when you rethink about it, it makes sense (the picture above demonstrates it). My problem is proving it. I tried proving by induction, but I got stuck in the inductive step (the base case is obvious). Any help would be greatly appreciated. **EDIT Let me show exactly where I'm stuck: Base case is easy. The first way @Yuval Filmus suggested: For the general case, since the number of leaves$n$is bigger than$1$, then the tree can be split to two trees; the left subtree of the root, call it$T_l$and the right subtree of the root, call it$T_r$, each has$n_l$and$n_r$leaves ($n_r,n_l\geq 1$and$n_l+n_r=n$). Now, according to the induction hypothesis, the cost of$T_l$is the sum of the$n_l-1$internal nodes in$T_l$, and the cost of$T_r$is the sum of the$n_r-1$internal nodes in$T_r$. I don't know what exactly does that imply with regards to the entire tree$T$... The second way @Yuval Filmus suggested: For the general case:$T$(with its$n$leaves) is obtained from a smaller tree, call it$T'$, in which two leaves was removed. According to the induction hypothesis, the cost of$T'$is the sum of its$(n-1)-1=n-2$internal nodes. Again, how should I proceed? I feel like something is missing... So suppose I say the cost of$T'$is:$|T'|=\sum_{i=1}^{n-2}w'_i$, where$w'_i$corresponds to an internal node. (it is denoted with apostrophe on purpose, to distinguish these weights from the$w_i$s - the weights of the terminal nodes). Am I on the right track? ### QuantOverflow #### Black Scholes - how to calculate delta with a vol skew I am trying to calculate the delta of an option at different strike prices where the underlying has a pronounced implied volatility skew in order to correctly hedge an options strategy. Researching on the net and previous questions on this site imply that BS can be used, but input of the correct IV is the hard part. Tags like "the wrong number in the wrong formula to get the right price", "sticky delta vs sticky strike", "skew adjusted delta" and Derman's work are the solutions I have found so far. Can anyone tell me if these are the latest or best methods, or is a stochastic vol model like SABR or Heston better? Is calculating one value for the position delta too optimistic - should the position delta actually be a range with associated confidence limits? ### Planet Emacsen #### emacspeak: Listening To Multiple Media Streams On The Emacspeak Audio Desktop # Listening To Multiple Media Streams On The Emacspeak Audio Desktop ## 1 Executive Summary The GitHub version of Emacspeak now supports launching and controlling multiple media streams. This enables one to listen to the news while playing a music stream, or relaxing nature sounds. ## 2 Sample Usage Here are some examples of using this feature: 1. Launch your favorite news station — BBC World Service in my case — C-e ; RET. 2. Place the News on the left channel — C-e ; (. 3. Persist the currently playing News stream by invoking command emacspeak-m-player-persist-stream bound to C-e ; \. This lets you launch a second stream via Emacspeak media key C-e ; rather than controlling the currently playing stream. 4. Launch a classical music media-stream — C-e ; lu RET for a lullaby media stream. 5. Now Emacspeak M-Player commands will control the most recently launched stream; you can once again invoke command emacspeak-m-player-persist-stream if you wish. 6. The previously launched (and still playing) News stream is now in a buffer named *Persistent-...*. Command emacspeak-wizards-view-buffers-filtered-by-m-player-mode can be used to list buffers that hold a live m-player instance. It is bound to b in emacspeak-m-player-mode. I also bind this command to C-; ; in my global keymap. 7. You can make an M-Player instance current by switching to its buffer and invoking command emacspeak-m-player-restore-process bound to / in emacspeak-m-player-mode. Share And Enjoy– ### StackOverflow #### How do I find which attributes my tree splits on, when using scikit-learn? I have been exploring scikit-learn, making decision trees with both entropy and gini splitting criteria, and exploring the differences. My question, is how can I "open the hood" and find out exactly which attributes the trees are splitting on at each level, along with their associated information values, so I can see where the two criterion make different choices? So far, I have explored the 9 methods outlined in the documentation. They don't appear to allow access to this information. But surely this information is accessible? I'm envisioning a list or dict that has entries for node and gain. Thanks for your help and my apologies if I've missed something completely obvious. ### Lobsters #### The Seven Righteous fights #### There Is No Thread #### Reducing Noise in Arabic Script ### QuantOverflow #### Constructing a Brownian motion from a Simple Random Walk I'm trying to get my head around how a Brownian motion is formed from a simple random walk. I've seen two similar methods used: Why has one approach used$\frac{1}{\sqrt{k}}$and the other hasn't? What are they both valid? The second approach suggests$\frac{1}{\sqrt{k}}$was added so that the resulting Brownian motion followed a normal distribution by the central limit theorem. Is this still the case for the first approach? ### Lobsters #### Submit to EmberConf CFP ### StackOverflow #### How to select a bunch of optimized data from a larger data set? How to select a bunch of optimized data from a larger data set? I have a data set containing the products to be sold to the customers. And the products were sold before, it has some information, like the review, ratings, how many were sold per product from the customers. Now, I want to select a bunch of optimized data from a larger data set based on the number of units sold, ratings, review etc all the information. The purpose is to increase the number of units to be sold in the future. How to do this? What kind of model, method, algorithms should be used here? If it is related to machine learning, what kind of machine learning tools should I use here? Thank you so much. #### PyBrain building NN for Machine Learning assignment I have an assignment for my Machine Learning class, which asks me to do the following: 1) Build supervised neural network on given data(i will discuss it shortly); 2) Save learned weights of neural network; 3) Give new data which classifies the data; 1. I was given data in csv format for alphabetical letters. For example "a" is represented by 18100 rows of 13 columns (1.554,-16.332,4.482,-8.687,-6.4,-13.585,-2.753,5.01,1.68,-11.699,-2.837,3.63,77.649), and class which is (for example) [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]. I have created both of them in arrays: First one with columns x1,x2,x3,x4,x5,x6,x7...x13. And second array consisting of Y [1,0,0,0,0...0]. 2. For all 22 letters i have around 200000 rows in csv. Now i need to train my NN and remember weights after the training is done. According to my teacher it shouldn't learn that good, only 30-40% of success rate. 3. Finally i will record a short word like "Hello". I have algorithm which turns audio into the same column size (x1;x2...x13) and N rows csv file. I need to pass it through my NN with already saved weights and return me last neurons outputs. For clarification what i mean - Input is 13 column row. Pass it through weights: input * W1 = S1. Then pass it through activation function (like sigmoid) we get Z1. Then another weight pass: Z1 * W2 = S2. And through activation function we get Z2. I need all those Z2 outpus, which kinda represents probability rates of belonging to a class. Row of each class looks like this: 1.554 -16.332 4.482 -8.687 -6.4 -13.585 -2.753 5.01 1.68 -11.699 -2.837 3.63 77.649 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.885 -1.867 5.005 0.727 -2.883 -4.72 -2.715 -8.903 -8.554 -6.694 -12.358 -4.156 60.951 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.713 -4.533 14.485 1.627 -3.337 -10.973 8.339 -3.397 -2.515 -8.093 -4.131 -1.378 63.679 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... -2.119 -4.231 18.443 -9.469 -4.751 -10.955 3.707 -3.068 3.584 -5.529 1.68 -3.396 66.592 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1  Converted to Numpy arrays(tiny sample): [[ 1.554 -16.332 4.482 -8.687 -6.4 -13.585 -2.753 5.01 1.68 -11.699 -2.837 3.63 77.649] [ 0.419 -15.77 3.805 -7.281 -9.612 -13.402 -1.787 4.708 1.32 -12.985 -0.754 3.356 78.381] [ -0.529 -15.11 3.47 -6.963 -10.676 -13.105 -1.002 2.196 2.112 -13.028 0.398 3.982 78.767] [ -1.419 -14.583 3.275 -7.228 -10.093 -14.126 -0.611 0.777 2.11 -12.581 1.939 4.161 78.914] [ -2.315 -13.639 3.059 -6.762 -9.836 -13.252 -0.925 1.568 2.261 -11.804 4.219 3.942 79.126] [ -0.662 -2.546 12.354 5.07 -12.44 -20.427 -5.518 -0.928 -0.987 7.868 -0.636 -3.885 65.318] [-17.817 -4.473 3.502 -2.069 7.663 -3.484 1.252 -8.484 1.982 -7.308 3.906 7.123 51.078] [-14.027 -9.039 0.915 -11.172 -0.704 -7.17 3.241 -2.844 7.462 -1.38 6.008 -7.331 57.459] [-12.503 -13.675 0.885 -13.739 -0.455 -11.39 3.986 -1.809 9.81 0.633 9.981 -10.02 64.061] [ 6.206 -12.486 -4.009 -7.346 1.106 -9.228 -7.5 -8.711 6.736 3.73 -3.41 -5.319 74.8 ] [ 6.583 -13.107 -2.495 -6.577 1.882 -8.514 -8.829 -8.241 6.478 3.315 -5.853 -3.36 74.295] [ 6.523 -13.319 -0.533 -4.972 1.131 -10.015 -8.946 -6.761 8.373 2.887 -6.759 -1.486 73.492]] [[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]]  The plan is: 1. Normalize data(as some columns have big values); 2. Load it into PyBrain understandable data set; 3. Create NN network and train it; 4. Extract weights; 5. Use extracted weights to calculate Z2 values of given new data/sample; The plan basically asks the questions. I have no idea how to implement any of steps. I have written a basic neural network from scratch and it works on Iris database, but it cannot classify my assignment database. That's why i am looking to finish the job with PyBrain. Or more likely redo it. EDIT. I have made a small sample of each letter(~10%). So far i have this. DATA3 = pd.read_csv('DATA.csv',header=None) XX = np.array(DATA3) dsXX = SupervisedDataSet(13,22) for line in XX: indataXX = tuple(line[:13]) outdataXX = tuple(line[13:]) dsXX.addSample(indataXX, outdataXX) fnnXX = buildNetwork(len(indataXX),13,len(outdataXX), outclass=SoftmaxLayer) trainer = BackpropTrainer(fnnXX, dataset=dsXX) trainer.trainOnDataset(dsXX,1000)  How can i obtain learned network weights? How can i find out the success rate of classification? #### Bootstrapping confidence interval from a regression prediction For homework, I was given data to create/train a predictor that uses lasso regression. I create the predictor and train it using the lasso python library from scikit learn. So now I have this predictor that when given input can predict the output. The second questions was to "Extend your predictor to report the confidence interval of the prediction by using the bootstrapping method." I've looked around and found examples of people doing this for the mean and other things. But I am completely lost on how I'm suppose to do it for a prediction. I am trying to use the scikit-bootstrap library. The course staff is being extremely unresponsive, so any help is appreciated. Thank you. ### CompsciOverflow #### Number of special paths between two nodes Consider a directed graph. Each node in this graph has an integer label. We want to count the number of special paths between source and sink. Let's define a variable named value. Every path starts with value = label[source]. When we move from node A to B value changes like this: value = lcm( label[B], value ) where lcm is lowest common multiplier. A speical path has two condition: 1- During the move from source to sink value should always change. It means when moving from node A to B, if value before and after the move remains unchanged, that path is not special. 2- value should become some predetermined integer k at the end of the path. How many ways we can go from source to sink while not contradicting above conditions. I think we can remove every node that lcm(label[node], k) != k because this node could not be on any special path. Also condition one removes every loop from the graph. Now the only algorithm I know about counting all the paths between two nodes is to use Dynamic Programming but I can't reduce this problem to that. Also I can compute the result using backtrack but as there could be exponentially large number of such paths, it's not efficient enough. original problem statement ### TheoryOverflow #### Approximation algorithm for minimal projection on collection of subspaces Suppose you have a collection of vector subspaces$V_1, V_2, ...$of$R^n$, and you are given a vector$v$. Let$v_i$denote the projection of our vector to subspace$V_i$. I am interested in algorithms that find the i such that$||v_i||$(under maybe an arbitrary norm) is minimal. It's not hard to show that this problem is NP-hard for certain families of subspaces. However, I am wondering if there are algorithms that achieve a good approximation factor to it (maybe with restrictions on the actual subspaces)? To clarify - the families of subspaces themselves are part of the problem specification (and depend on the dimension of the vector), and are not given as input to the algorithm. Ideally, the algorithm would be polynomial in the size of the vector. This makes this somewhat of an open-ended problem as I'm looking for information about families of vector subspaces for which this problem has been studied, rather than information about a specific family of subspaces. ### CompsciOverflow #### In a$k$-way set associative cache,main memory block mapping in range? In a$k$-way set associative cache, the cache is divided into$v$sets, each of which consists of$k$lines. The lines of a set are placed in sequence one after another. The lines in set$s$are sequenced before the lines in set$(s+1)$. The main memory blocks are numbered 0 onwards. The main memory block numbered$j$must be mapped to any one of the cache lines from 1.$(j\text{ mod }v) * k \text{ to } (j \text{ mod } v) * k + (k-1) $2.$(j \text{ mod } v) \text{ to } (j \text{ mod } v) + (k-1) $3.$(j \text{ mod } k) \text{ to } (j \text{ mod } k) + (v-1) $4.$(j \text{ mod } k) * v \text{ to } (j \text{ mod } k) * v + (v-1) $Somewhere it explained as : Number of sets in cache = v. So, main memory block j will be mapped to set (j mod v), which will be any one of the cache lines from (j mod v) * k to (j mod v) * k + (k-1). (Associativity plays no role in mapping- k-way associativity means there are k spaces for a block and hence reduces the chances of replacement.) I'm not getting this solution. can you explain little bit, how we multiple k (include diagram if possible). ### AWS #### Now Available – EC2 Dedicated Hosts Last month, I announced that we would soon be making EC2 Dedicated Hosts available. As I wrote at the time, this model allows you to control the mapping of EC2 instances to the underlying physical servers. Dedicated Hosts allow you to: • Bring Your Own Licenses – You can bring your existing server-based licenses for Windows Server, SQL Server, SUSE Linux Enterprise Server, and other enterprise systems and products to the cloud. Dedicated Hosts provide you with visibility into the number of sockets and physical cores that are available so that you can obtain and use software licenses that are a good match for the actual hardware. • Help Meet Compliance and Regulatory Requirements – You can allocate Dedicated Hosts and use them to run applications on hardware that is fully dedicated to your use. • Track Usage – You can use AWS Config to track the history of instances that are started and stopped on each of your Dedicated Hosts. This data can be used to verify usage against your licensing metrics. • Control Instance Placement – You can exercise fine-grained control over the placement of EC2 instances on each of your Dedicated Hosts. Available Now I am happy to be able to announced the Dedicated Hosts are available now and that you can start using them today. You can launch them from the AWS Management Console, AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, or via code that makes calls to the AWS SDKs. Let’s provision a Dedicated Host and then launch some EC2 instances on it via the Console! I simply open up the EC2 Console, select Dedicated Hosts in the left-side navigation bar, and click on Allocate a Host. I choose the instance type (Dedicated hosts for M3, M4, C3, C4, G2, R3, D2, and I2 instances are available), the Availability Zone, and the quantity (each Dedicated Host can accommodate one or more instances of a particular type, all of which must be the same size). If I choose to allow instance auto-placement, subsequent launches of the designed instance type in the chosen Availability Zone are eligible for automatic placement on the Dedicated Host, and will be placed there if instance capacity is available on the host and the launch specifies a tenancy of Host without specifying a particular one. If I do not allow auto-placement, I must specifically target this Dedicated Host when I launch an instance. When I click Allocate host, I’ll receive confirmation that it was allocated: Billing for the Dedicated Host begins at this point. The size and number of instances are running on it does not have an impact on the cost. I can see all of my Dedicated Hosts at a glance. Selecting one displays detailed information about it: As you can see, my Dedicated Host has 2 sockets and 24 cores. It can host up to 22 m4.large instances, but is currently not hosting any. The next step is run some instances on my Dedicated Host. I click on Actions and choose Launch Instance(s) onto Host (I can also use the existing EC2 launch wizard): Then I pick an AMI. Some AMIs (currently RHEL, SUSE Linux, and those which include Windows licenses) cannot be used with Dedicated Hosts, and cannot be selected in the screen below or from the AWS Marketplace: The instance type is already selected: Instances launched on a Dedicated Host must always reside within a VPC. A single Dedicated Host can accommodate instances that run in more than one VPC. The remainder of the instance launch process proceeds in the usual way and I have access to the options that make sense when running on a Dedicated Host. You cannot, for example, run Spot instances on a Dedicated Host. I can also choose to target one of my Dedicated Hosts when I launch an EC2 instance in the traditional way. I simply set the Tenancy option to Dedicated host and choose one of my Dedicated Hosts (I can also leave it set to No preference and have AWS make the choice for me): If I select Affinity, a persistent relationship will be created between the Dedicated Host and the instance. This gives you confidence that the instance will restart on the same Host, and minimizes the possibility that you will inadvertently run licensed software on the wrong Host. If you import a Windows Server image (to pick one that we expect to be popular), you can keep it assigned to a particular physical server for at least 90 days, in accordance with the terms of the license. I can return to the Dedicated Hosts section of the Console, select one of my Hosts, and learn more about the instances that are running on it: Using & Tracking Licensed Software You can use your existing software licenses on Dedicated Hosts. Verify that the terms allow the software to be used in a virtualized environment, and use VM Import/Export to bring your existing machine images into the cloud. To learn more, read about Bring Your Own License in the EC2 Documentation. To learn more about Windows licensing options as they relate to AWS, read about Microsoft Licensing on AWS and our detailed Windows BYOL Licensing FAQ. You can use AWS Config to record configuration changes for your Dedicated Hosts and the instances that are launched, stopped, or terminated on them. This information will prove useful for license reporting. You can use the Edit Config Recording button in the Console to change the settings (hovering your mouse over the button will display the current status): To learn more, read about Using AWS Config. Some Important Details As I mentioned earlier, billing begins when you allocate a Dedicated Host. For more information about pricing, visit the Dedicated Host Pricing page. EC2 automatically monitors the health of each of your Dedicated Hosts and communicates it to you via the Console. The state is normally available; it switches to under-assessment if we are exploring a possible issue with the Dedicated Host. Instances launched on Dedicated Hosts must always reside within a VPC, but cannot make use of Placement Groups. Auto Scaling is not supported, and neither is RDS. Dedicated Hosts are available in the US East (Northern Virginia), US West (Oregon), US West (Northern California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and South America (Brazil) regions. You can allocate up to 2 Dedicated Hosts per instance family (M4, C4, and so forth) per region; if you need more, just ask. Jeff; ### QuantOverflow #### Meaning of conservative in risk management? I believe this question is best asked here, as it pertains to risk, rather than English SE. What is the meaning of conservative in the context of risk management? In general, conservative would mean small or comparatively small, but coming across the term in different industry papers, I get the feeling it is the opposite. For example, if one applies a conservative haircut to a counterparty's collateral, is this a small haircut? Another example, this from the Fed's paper on capital planning at bank holing companies: The federal Reserve expects BHCs to apply generally conservative assumptions throughout the stress testing process. In this context, I can't figure they are suggesting to make stressed variables small. ### StackOverflow #### How to combine Processing with Sublime Text 2 [Windows 7, X64] Hello. I'd like to combine Processing with Sublime Text 2, but i can't do that. WHAT I DID: 1. Installed Processing and Sublime Text 2. 2. Installed Package Control in Sublime Text 2 using the code from packagecontrol.io. 3. Installed Processing in Sublime Text 2 using Package Control. 4. After that I followed the instruction from https://github.com/b-g/processing-sublime to install processing-java: 1. Open the "Advanced System Settings" by running sysdm.cpl 2. In the "System Properties" window, click on the Advanced tab. 3. In the "Advanced" section, click the Environment Variables button. 4. Edit the "Path" variable. 5. Append the processing path (e.g. ;C:\Program\Files\Processing-2.0b6) to the variable value. Each entry is separated with a semicolon. SO, RIGHT NOW I HAVE: PATH: C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Program Files (x86)\Processing-2.2.1 Path: C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;Z:\usr\local\php5;C:\Program Files\Common Files\Autodesk Shared\;Z:\home\yii\framework\;C:\Program Files (x86)\Skype\Phone\;C:\Program Files (x86)\QuickTime\QTSystem\;C:\Program Files (x86)\Processing-2.2.1 BUT: When I click "Build" in Sublime Text 2, i have this in the console: Running processing-java --sketch=C:\Arthur\Generative Art\test 1\build --output=C:\Артур\Generative Art\test 1\build/build-tmp --run --force Traceback (most recent call last): File ".\sublime_plugin.py", line 337, in run_ File ".\exec.py", line 154, in run File ".\exec.py", line 45, in init UnicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 0: ordinal not in range(128) And Processing doesn't show the result of the code. HOW can I fix that? Please, help me! ### CompsciOverflow #### Using Pushdown Automata to prove a language is context free [duplicate] This question already has an answer here: How do I prove that the language L is a context free using Pushdown Automata? I would like to know the process of proving it. ### Dave Winer #### About 1999.io (coming soon) A few notes about the new blogging software I'm working on. It continues on the same approach I took with Manila, and refined with Radio UserLand. A home page from which everything radiates. A focus on simplicity, an intense level of factoring to reduce the number of steps it takes to post something new or edit an existing post. I wanted the fluidity of Twitter and Facebook. It should be just as easy to create a new post as it is to write a tweet, of course without the 140 char limit. Here's a screen shot of me editing the initial version of this post. The central innovation of Manila was Edit This Page. I take that one step further in this product. If you see something that needs changing, just click, edit, Save. This is easier than Facebook, and of course editing posts is not possible in Twitter. I think of this as the first post-Twitter post-Facebook blogging system. The motto of the software: Blogging like it's 1999! The name of the product: 1999.io. PS: This is a 1999.io post. #### New version of WordPress The new version of WordPress -- released today -- is a Node app. And they have a Mac desktop app. I've tried them both, and they're really nice, and it's still WordPress. The product has the same familiar organization and structure. At the same time I'm finishing my own Node-based blogging system. It's really cool that WordPress is running in the same environment. There may be some interesting integrations possible as a result. But first I have to ship. :-> ### Lobsters #### Dance to Calypso - Wordpress' Node + React project ### QuantOverflow #### Alternative ways to understand time-varying comovement between two time-series? I have been looking into ways to better understand how the dependencies/correlations/etc between two time series can vary over time. I first thought about using a Kalman/particle filter over a linear model to get a time-varying slope estimate. However, I'm worried that this will also pick up changing relative variances between the two time series and an increasing slope estimate doesn't actually mean a stronger relationship between the two time-series. I have looked into time-varying quantile regression but am unconvinced that a changing slope parameter means much, even if it accounts for asymmetry over the various quantiles. The same thing goes for the new time-varying cointegration technology. I have other reservations about DCC-GARCH because to the best of my knowledge it's a time-varying estimate of the Pearson estimator and is therefore not able to pick up non-linear dependencies (since it's essentially the time-varying square root of the$R^2$of a linear regression). I'm concerned that the DCC-GARCH correlation estimate might decrease because linear dependencies are reducing, even if non-linear dependencies are increasing. So what else is there and how can it help me to pick up the time-varying dependencies between two time series that accounts for both linear and non-linear relationships? Something like a time-varying Kendall tau or time-varying mutual information would be nice. #### Real TIme/Historical weather data I am looking to incorporate weather data into my algorithmic models. What is a good source to find historical + real time weather data by zipcode or region? Any help will be appreciated! Preferably an API ### Planet Theory #### Talk, be merry, and be rational Yesterday I wrote a statement on behalf of a Scott Alexander SlateStarCodex/rationalist meetup, which happened last night at MIT (in the same room where I teach my graduate class), and which I’d really wanted to attend but couldn’t. I figured I’d share the statement here: I had been looking forward to attending tonight’s MIT SlateStarCodex meetup as I hardly ever look forward to anything. Alas, I’m now stuck in Chicago, with my flight cancelled due to snow, and with all flights for the next day booked up. But instead of continuing to be depressed about it, I’ve decided to be happy that this meetup is even happening at all—that there’s a community of people who can read, let’s say, a hypothetical debate moderator questioning Ben Carson about what it’s like to be a severed half-brain, and simply be amused, instead of silently trying to figure out who benefits from the post and which tribe the writer belongs to. (And yes, I know: the answer is the gray tribe.) And you can find this community anywhere—even in Cambridge, Massachusetts! Look, I spend a lot of time online, just getting more and more upset reading social justice debates that are full of people calling each other douchebags without even being able to state anything in the same galactic supercluster as the other side’s case. And then what gives me hope for humanity is to click over to the slatestarcodex tab, and to see all the hundreds of comments (way more than my blog gets) by people who disagree with each other but who all basically get it, who all have minds that don’t make me despair. And to realize that, when Scott Alexander calls an SSC meetup, he can fill a room just about anywhere … well, at least anywhere I would visit. So talk, be merry, and be rational. I’m now back in town, and told by people who attended the meetup that it was crowded, disorganized, and great. And now I’m off to Harvard, to attend the other Scott A.’s talk “How To Ruin A Perfectly Good Randomized Controlled Trial.” Update (Nov. 24) Scott Alexander’s talk at Harvard last night was one of the finest talks I’ve ever attended. He was introduced to rapturous applause as simply “the best blogger on the Internet,” and as finally an important speaker, in a talk series that had previously wasted everyone’s time with the likes of Steven Pinker and Peter Singer. (Scott demurred that his most notable accomplishment in life was giving the talk at Harvard that he was just now giving.) The actual content, as Scott warned from the outset, was “just” a small subset of a basic statistics course, but Scott brought each point alive with numerous recent examples, from psychiatry, pharmacology, and social sciences, where bad statistics or misinterpretations of statistics were accepted by nearly everyone and used to set policy. (E.g., Alcoholics Anonymous groups that claimed an “over 95%” success rate, because the people who relapsed were kicked out partway through and not counted toward the total.) Most impressively, Scott leapt immediately into the meat, ended after 20 minutes, and then spent the next two hours just taking questions. Scott is publicity-shy, but I hope for others’ sake that video of the talk will eventually make its way online. Then, after the talk, I had the honor of meeting two fellow Boston-area rationalist bloggers, Kate Donovan and Jesse Galef. Yes, I said “fellow”: for almost a decade, I’ve considered myself on the fringes of the “rationalist movement.” I’d hang out a lot with skeptic/effective-altruist/transhumanist/LessWrong/OvercomingBias people (who are increasingly now SlateStarCodex people), read their blogs, listen and respond to their arguments, answer their CS theory questions. But I was always vaguely uncomfortable identifying myself with any group that even seemed to define itself by how rational it was compared to everyone else (even if the rationalists constantly qualified their self-designation with “aspiring”!). Also, my rationalist friends seemed overly interested in questions like how to prevent malevolent AIs from taking over the world, which I tend to think we lack the tools to make much progress on right now (though, like with many other remote possibilities, I’m happy for some people to work on them and see if they find anything interesting). So, what changed? Well, in the debates about social justice, public shaming, etc. that have swept across the Internet these past few years, it seems to me that my rationalist friends have proven themselves able to weigh opposing arguments, examine their own shortcomings, resist groupthink and hysteria from both sides, and attack ideas rather than people, in a way that the wider society—and most depressingly to me, the “enlightened, liberal” part of society—has often failed. In a real-world test (“real-world,” in this context, meaning social media…), the rationalists have walked the walk and rationaled the rational, and thus they’ve given me no choice but to stand up and be counted as one of them. Have a great Thanksgiving, those of you in the US! Another Update: Dana, Lily, and I had the honor of having Scott Alexander over for dinner tonight. I found this genius of human nature, who took so much flak last year for defending me, to be completely uninterested in discussing anything related to social justice or online shaming. Instead, his gaze was fixed on the eternal: he just wanted to grill me all evening about physics and math and epistemology. Having recently read this Nature News article by Ron Cowen, he kept asking me things like: “you say that in quantum gravity, spacetime itself is supposed to dissolve into some sort of network of qubits. Well then, how does each qubit know which other qubits it’s supposed to be connected to? Are there additional qubits to specify the connectivity pattern? If so, then doesn’t that cause an infinite regress?” I handwaved something about AdS/CFT, where a dynamic spacetime is supposed to emerge from an ordinary quantum theory on a fixed background specified in advance. But I added that, in some sense, he had rediscovered the whole problem of quantum gravity that’s confused everyone for almost a century: if quantum mechanics presupposes a causal structure on the qubits or whatever other objects it talks about, then how do you write down a quantum theory of the causal structures themselves? I’m sure there’s a lesson in here somewhere about what I should spend my time on. ### AWS #### AWS Week in Review – November 16, 2015 Let’s take a quick look at what happened in AWS-land last week:  Monday November 16 The AWS Security Blog announced that the AWS Security Token Service (STS) Is Now Active by Default in All AWS Regions. Eric Hammond talked about Using AWS CodeCommit with Git Repositories in Multiple AWS Accounts. The Cloudyn Blog analyzed EC2 Price Drops over the Past 5 Years. Mislav Stipetic noted that AWS is Awesome and discussed AWS Lambda. João Parreira talked about Receiving AWS IoT Messages in Your Browser Using Websockets. We published a new whitepaper: Minimizing Variable Costs for Shared Data. The Cloud Enlightened Blog showed you how to Save AWS Costs by Scheduled Start and Stop of EC2 Instances. Tuesday November 17 The AWS Security Blog announced a Preview of SMS MFA for IAM Users. The Amazon Mobile App Distribution Blog published Part 3 of an article on Building Unity Games for Fire TV. The AWS Startup Collection shared Three Internal Software Tools Every Startup Should Use. Steligent shared an AWS Continuous Delivery Demo Screencast. The Cloud Academy Blog discussed AWS Network ACLs and Subnets for Network Level Security. Cloud Technology Partners noted that Stats are Wrong: The Public Cloud is Already the Norm. Segment talked about Automating Our Infrastructure to Empower Engineers. Julien Blanchard discussed Rust on AWS Lambda. Wednesday November 18 Thursday November 19 We launched the AWS SDK for Go. We updated that the AWS Device Farm can now Test Web Apps on Mobile Devices. We announced an Amazon EMR Update with Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie. We announced Detailed Health Metrics in the Console for AWS Elastic Beanstalk. We announced Real-Time Predictions in the Amazon Machine Learning Console. We announced that RDS PostgreSQL now Supports Point-and-Click Upgrade from PostgreSQL 9.3 to 9.4. The AWS Partner Network Blog invited you to Discover Exclusive New Content and Webcasts on the AWS Partner Portal. We updated the AWS CLI, the AWS SDK for Ruby, the AWS SDK for Java, and the AWS SDK for JavaScript. The Amazon Mobile App Distribution Blog showed you how to Build Your First Mobile App – No Coding Experience Required. The Amazon Mobile App Distribution Blog shared a Dev Chat with NeuroNation: Driving User Engagement and Monetizaton with Amazon Underground. The AWS Startup Collection discussed SafeDK: Giving Control Back to App Developers in an SDK-Fueled World. Cloudability told you Why Your Org Needs a Reserved Instances Czar. The DZone Cloud Zone talked about ECS at Coursera: Powering a General-Purpose Near-Line Execution Microservice. Friday November 20 We announced Saved Reports for the AWS Cost Explorer. We announced that Amazon Redshift now Supports Modifying Cluster Accessibility and Specifying Sort Order for NULL Values. The AWS Security Blog announced that AWS Completed a Successful SOC Assessment with 3 New Services in Scope. The DZone Cloud Zone talked about The Science of Saving Money on Reserved Instances. Colin Percival launched a FreeBSD AMI Builder AMI. New & Notable Open Source • goofyfs is a filey (their terminology) system for S3. • aws-sdk-perl is an attempt to build a complete AWS SDK in Perl. • aws-ses-recorder is a set of Lambda functions to process SES. • flywheel is a proxy for AWS. • aws-sdk-config-loader is an AWS config file loader for the CLI tools. • caravan is a lightweight Python framework for SWF. • rusoto is a set of AWS client libraries for Rust. • ng-upload-s3 is an AngularJS directive to upload files directly to S3. • aws-templates is a collection of custom CloudFormation templates. • ec2-browser is an EC2 browser. • Consigliere is an AWS Trusted Advisor dashboard that supports multiple accounts. New SlideShare Presentations New Customer Success Stories New YouTube Videos Upcoming Events Help Wanted Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed. Jeff; ### QuantOverflow #### Completeness and Hedging Question A question in some private notes I'm struggling to work through (exam. prep.). (iii) is where I hit a wall with my understanding & I'm lost thereafter. Any help/clarification gratefully received. ... Consider a financial market with$d = 1$risk security, whose price$S^1$is determined by $$dS_t^1 = (1/{S_t^1})dt + dW_t^1$$ In addition, the risk-free rate is$r_f=0$, so the price of the money-market account is$S^0 = 1$. i) Derive an expression for the market price of risk ii) Derive a stochastic differential equation for the numeraire portfolio iii) Use the real-world pricing forumla to derive an expression for the price of a zero-coupon bond (ZCB) with a face-value of$1 (Note: Notice the ZCB is an equity derivative in this model!)

iv) Derive expression for the weights of the hedge portfolio for the ZCB

v) Consider a portfolio consisting of a long position in the hedge portfolio above, and a short position in the money-market account, structured in such a way that its initial value is zero. What is the final payoff of this portfolio? What type of arbitrage is it?

vi) Does the model in question satisfy NA$_+$? Justify. vii) Does the model in question satisfy NA? Justify. viii) Does the model in question satisfy NUPBR? Justify. ix) Does the model in question satisfy NFLVR? Justify.

### StackOverflow

#### How to use a cross validation test with MATLAB?

I would like to use 10-fold Cross-validation to evaluate a discretization in MATLAB. I should first consider the attributes and the class column.

### QuantOverflow

#### portfolio optimization averaging weights

I'm playing around with different portfolio optimization techniques. Amongst others I was also looking at the resampling method, especially the one described in Meucci. I have two general questions regarding this technique.

1. Question I would like to know what more expierenced people think about resampling methods? I've noticed that there is a controversial discussion on this site as well as in the research area. For example, Scherer critique. On the other Meucci also points out some advantages and mentions the wide usage of this technique in industry. In my opinion, or what I've seen running some expirements, there is no additional gain regarding return but weights are much more stable when you have to recalibrate them.

2. Question by averaging these returns, i.e. say for specific portfolios specified by a certain desired risk, taking average leads often to weights which do not sum up to one anymore. Even after removing "outliers", i.e. more extreme values. Have you seen this issue as well? How is this resolved in the industry?

### Fefe

#### Der Sicherheitshinweis der Woche kommt von der Wyoming ...

Der Sicherheitshinweis der Woche kommt von der Wyoming Oil and Gas Conservation Commission:
RECOMMEND YOU DO NOT USE SPECIAL CHARACTERS IN THE DATA YOU ARE SUBMITTING!

Such as ~,,!,@,#,$,%,^,&,*,+,=,?,'," as they are interpreted by SQL and may cause you errors! #### Erinnert ihr euch an die Bilder mit dem Militär in ... Erinnert ihr euch an die Bilder mit dem Militär in den Straßen von Boston nach dem Anschlag auf den Marathon? Also sie Ausgehsperren verhängt hatten und die menschenleeren Straßen von Militärkonvoys patrouilliert wurden? Daran muss ich bei diesem aktuellen Bild aus Brüssel denken, auch wenn die Menschen dort das anscheinend relativ gelassen nehmen. Schwindelerregend, wie schnell die Zivilisationsdecke vom Tisch flutscht, wenn nur mal jemand ein bisschen am einen Ende zieht. #### Ein russisches Gericht hat jetzt Scientology verboten. ... Ein russisches Gericht hat jetzt Scientology verboten. Hintergrund waren wohl die vielen Trademarks, die Scientology auf Begriffe ihrer "Religion" angemeldet hat. #### Saudi Arabien brennt gerade ziemlich rasant durch ihre ... Saudi Arabien brennt gerade ziemlich rasant durch ihre Ressourcen. Die Wand am Ende der Rodelbahn ist schon in Sicht. #### Den guten Admin erkennt man daran, dass er alles automatisiert, ... ### CompsciOverflow #### Reduction Vertex-Cover to MAX-2SAT Any ideas how to reduce Vertex Cover to Max-2SAT? Where Max-2SAT = {$ (\varphi,k): \varphi\in 2CNF ,\ \exists\ assignment\ satisfying \geq k\ clauses$} . Is constructing$\varphi$; such that it will consist of$m=|E|$2CNF clauses , one$(x_{u} \vee x_{v})$clause per each edge$(u,v)\in E$, will do the job ? #### SQL (Structured Query Language) Statement Given the database Students (sID, sFName, sLName, sEmail); Courses (cID, cName, cr); Profs (pID, pFName, pMName, pLName, pRank, pEmail); Sections (cID, sectID, pID) where FOREIGN KEY (cID) REFERENCES Courses ON DELETE CASCADE, FOREIGN KEY (pID) REFERENCES Profs; Enrolled (sID, cID, sectID, grade) where FOREIGN KEY (sID) REFERENCES Students, FOREIGN KEY (cID, sectID) REFERENCES Sections; a. Find the section IDs of COMP 4060 not taken by student with ID 6700001. Select sectID From Enrolled Where cID = ‘Comp 4060’ Except Select sectID From Enrolled Where sID = 6700001 b. Find the IDs of students who take all sections of COMP 4060. Select sID From Enrolled Where cID = ‘Comp 4060' Group by sectID, cID Is that how you write it? ### QuantOverflow #### How to estimate CVA by valuing a CDS of the counterparty? I'm trying to estimate CVA of one of my derivatives by valuing a credit default swap (CDS) of my counterparty. However, I don't know how to set up the CDS deal (notional amount, maturity, etc.). Thanks! ### Lobsters #### State of Mercurial at Mozilla ### StackOverflow #### Weka IBk parameter details (distanceWeighting, meanSquared) I am using kNN algorithm to classify. In weka they have provided various parameter setting for kNN. I am intersted to know about the distanceWeighting, meanSquared. In distanceWeighting we have three values (No distance weighting, weight by 1/distance and weight by 1-distance). What are these values and what is their impact? Can someone please expalin me? :) #### How can I implement a custom RNN (specifically an ESN) in Tensorflow? I am trying to define my own RNNCell (Echo State Network) in Tensorflow, according to below definition. x(t + 1) = tanh(Win*u(t) + W*x(t) + Wfb*y(t)) y(t) = Wout*z(t) z(t) = [x(t), u(t)] x is state, u is input, y is output. Win, W, and Wfb are not trainable. All weights are randomly initialized, but W is modified like this: "Set a certain percentage of elements of W to 0, scale W to keep its spectral radius below 1.0 I have this code to generate the equation. x = tf.Variable(tf.reshape(tf.zeros([N]), [-1, N]), trainable=False, name="state_vector") W = tf.Variable(tf.random_normal([N, N], 0.0, 0.05), trainable=False) # TODO: setup W according to the ESN paper W_x = tf.matmul(x, W) u = tf.placeholder("float", [None, K], name="input_vector") W_in = tf.Variable(tf.random_normal([K, N], 0.0, 0.05), trainable=False) W_in_u = tf.matmul(u, W_in) z = tf.concat(1, [x, u]) W_out = tf.Variable(tf.random_normal([K + N, L], 0.0, 0.05)) y = tf.matmul(z, W_out) W_fb = tf.Variable(tf.random_normal([L, N], 0.0, 0.05), trainable=False) W_fb_y = tf.matmul(y, W_fb) x_next = tf.tanh(W_in_u + W_x + W_fb_y) y_ = tf.placeholder("float", [None, L], name="train_output")  My problem is two-fold. First I don't know how to implement this as a superclass of RNNCell. Second I don't know how to generate a W tensor according to above specification. Any help about any of these question is greatly appreciated. Maybe I can figure out a way to prepare W, but I sure as hell don't understand how to implement my own RNN as a superclass of RNNCell. #### Forecasting Waves I have an interesting Forecasting problem. I have a set of waves that I am trying to filter and recombine (see link). I am thinking to use a moving average for each but this doesn't give the results I would like. Image of waves to be forecast I think the waves are really a sinusoid around a moving average that would give a better result, but the issue with this is that the moving average doesn't easily allow you to forecast where it is going to. I would be interested if anyone knows a good model to use to estimate each wave that would give better forecasting performance. ### CompsciOverflow #### How can i prove that every NFA can be converted to a equivalent NFA that has only one accepting state? [on hold] I don't even know how to begin ! If you can even give me a hint on how to start proving that i would be appreciated. ### Planet Theory #### Telephone primes This weekend I helped add a new sequence to OEIS, A264737 of the prime numbers that divide at least one telephone number (the numbers of matchings in a complete graph etc). The telephone numbers obey a simple recurrence T(n) = T(n-1) + (n-1)T(n-2), and it's easy to test whether a prime number p divides at least one telephone number by running this recurrence modulo p. Whenever n is 1 mod p, the right hand side of the recurrence simplifies to T(n-1) mod p, and we get two consecutive numbers that are equal mod p; After that point, the recurrence continues as it would from its initial conditiions (two consecutive ones), multiplied mod p by some unknown factor. Therefore, the recurrence mod p either repeats exactly with period p, or it becomes identically zero (as it does for p=2), or it repeats with a higher period that is a multiple of p and a divisor of p(p–1), where all sub-periods of length p are multiples of each other. In particular, if p divides at least one telephone number, it divides infinitely many of them, whose positions are periodic with period p. All primes divide at least one Fibonacci number (a sequence of numbers with an even simpler recurrence) but that is not true for the telephone numbers. For instance, the telephone numbers mod 3 form the infinite repeating sequence 1,1,2,1,1,2,... with no zeros. So how many of the prime numbers are in the new sequence? A heuristic estimate suggests that the telephone primes should form a 1–1/e fraction of all primes (around 63.21%): p is a telephone prime when there is a zero in the first p terms of the recurrence sequence mod p, and if we use random numbers instead of the actual recurrence then the probability of not getting a zero is approximately 1/e. With this estimate in mind, I tried some computational experiments and found that among the first 10000 primes, 6295 of them (approximately 63%) are in the sequence. Pretty accurate, I think! But I have no idea how to approach a rigorous proof that this estimate should be correct. Incidentally, while looking up background material for this I ran into a paper by Rote in 1992 that observes a relationship between the telephone number and another sequence, A086828. A086828 counts the number of states in a dynamic programming algorithm for the traveling salesman problem on graphs of bandwidth k, for a parameter k. So its calculation, in the mid-1980s, can be seen as an early example of the parameterized analysis of algorithms. It has the same recurrence relation as the telephone numbers, but with different initial conditions, so we can consider using this sequence instead of the telephone numbers. But the same analysis above showing that all subperiods of length p are similar applies equally well to this sequence, showing that after an initial transient of length p, all subperiods are either identically zero or similar to the corresponding subperiods of the telephone numbers. So if we ask which primes divide at least one member of A086828, we get almost the same answer, except possibly for some additional primes that either divide one of the first p numbers of A086828 (and then no other members of A086828 later in the sequence) or that divide all but finitely many members of A086828. ### QuantOverflow #### Adding Asset Weights To Cholesky Output - Monte Carlo in VBA I am looking to create a Monte Carlo generator in Excel to plot correlated asset paths for a portfolio containing 1 to 10 assets. I have the correlation matrix for all 10 assets and have performed the Cholesky Decomposition to obtain the lower NXN matrix output using some VBA code. I am looking for some guidance on when/how to incorporate the asset weights of the portfolio into my path generation. As an example for a three asset portfolio I generate three separate series of random variables using the NORMSINV(RAND()) = RN function to return the sigma. Then multiply each random variable by the corresponding cholesky output and sum the series to get a correlated random variable. CRV (correlated ran. variables) = =RN1*chol1 + RN2*chol2 + RN3*chol3 I then found instruction on setting up your drift and volatility terms to generate a series of the log of prices factoring in the above input. Log of Prices=LN(Starting Price)+(Drift-0.5*volatility * volatility) + volatility*CRV Where do I factor in my asset weights? Let's say 20/30/50 for a three asset portfolio? ### CompsciOverflow #### Method of inductive statements for proving partial correctness of block-schemes I'm trying to find an explanation and more information on a method and some example problems with solutions using that method. The method doesn't seem to translate well in english (I'm from a foreign country, and there isn't really any info on it , in the textbook). So the info I have to help you is: Method of inductive statements for proving partial correctness of block-schemes. This is the title and later on we have : Inductive statements method of Floyd and the algorithm is basically saying we make "cutting" points in the arrows(we treat the block-scheme as a graph) and then use some induction, and show that if a statemnet hold for$A_i$and going from cut$i$to cut$j$, then some outher statement should hold for$A_j$and so on. It's not particullary well written in the textbook and there aren't any problem and solution examples. So any link to an explanation of the method and examples of solving problems with it would be great. Thanks in advance :) #### Prove one-wayness of f(f(x)) and f(x) || f(f(x)) given that f is one-way [on hold] Given a one-way function$f$I need to verify if$f(f(x))$and$f(x)||f(f(x))$are still one-way. In my opinion the answer is trivially yes, but maybe I have done a too much superficial interpretation. #### What is a brief but complete explanation of a pure/dependent type system? If something is simple, then it should be completely explainable with a few words. This can be done for the λ-calculus: The λ-calculus is a syntactical grammar (basically, a structure) with a reduction rule (which means a search/replace procedure is repeatedly applied to every occurrence of a specific pattern until no such pattern exists). Grammar: Term = (Term Term) | (λ Var . Term) | Var  Reduction rule: ((λ var body) term) -> SUBS(body,var,term) where SUBS replaces all occurrences of var by term in body, avoiding name capture.  Examples: (λ a . a) -> (λ a a) ((λ a . (λ b . (b a))) (λ x . x)) -> (λ b . (b (λ x x))) ((λ a . (a a)) (λ x . x)) -> (λ x . x) ((λ a . (λ b . ((b a) a))) (λ x . x)) -> (λ b . ((b (λ x . x)) (λ x . x))) ((λ x . (x x)) (λ x . (x x))) -> never halts  While somewhat informal, one could argue this is informative enough for a normal human to understand the λ-calculus as a whole - and it takes 22 lines of markdown. I'm trying to understand pure/dependent type systems as used by Idris/Agda and similar projects, but the briefer explanation I could find was Simply Easy - a great paper, but that seems to assume a lot of previous knowledge (Haskell, inductive definitions) that I don't have. I think something briefer, less rich could eliminate some of those barriers. Thus, Is it possible to give a brief, complete explanation of pure/dependent type systems, in the same format I presented the λ-calculus above? ### DataTau #### The Black Friday Puzzle - Understanding Markov Chains #### Refinery -- user-friendly interface to do NLP topic modelling with LDA #### Interviews - DS Lore #### Pandas 0.17.1 is out with conditional HTML formatting #### Building Analytics at Simple ### QuantOverflow #### market change, correlation and estimation bias I hear many quants sating that markets change very slowly. This "fact" is even presented as a justification of statistical arbitrage, for example, by affirming that correlations remain roughly the same for long periods, and then insight given by these correlations or by a PCA applied on the correlation matrix is valid through time. My question is : indeed, correlation matrix does not change on a daily basis, but isn't that due to an estimation bias? When the estimation is based on last 250 days for example, any new day contribution is very small and does not dramatically change the estimator, and real correlation may be much more stochastic than the stable matrix estimated. If this is the case, how come this artificially stable correlation matrix can give profitable trading strategies relying on it? ### Lobsters #### Android Studio 2.0 Preview #### Building Sustainable Open Source Businesses #### How to Charge for your Open Source #### Breaking the fourth wall with Minecraft ### TheoryOverflow #### Writing unitary transformations for non-adjacent qubits I understand that we can always come up with equivalent unitary transformations when the order of the qubits changes. If the unitary transformation is working on a number of adjacent qubits, we can just pad it with identity matrices for the qubits on which no unitary transformation is working. On the other hand, if the qubit which not the subject to any transformation but lies between the qubits who are subject to that the unitary transformation gets bigger. For example, If there is such an input qubit between the two control qubits of a Toffoli gate, the unitary transform becomes as follows. $$T = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \end{pmatrix}$$ Another example is the following quantum division circuit taken from here. The QFT circuit has a control qubit$|q_n\rangle$but between it and the input to the QFT, there is a number of non-participating qubits$|D\rangle$of length$n$. I can definitely right a unitary transformation which will always keep$|D\rangle$unaffected but do the job. But I feel that the matrix will be unnecessarily big. There should be a systematic way where I can permute the qubits so that the qubits input to QFT will become adjacent and after the unitary transformation and "unpermutation" they will go back to their original position. I can write my permutation matrices but is there already a systematic way to do this? ### StackOverflow #### Swift Functional Programming: JSON parsing I'm relatively new to functional programming with Swift and trying to parse Json using this blog post I found by Chris Eidhof. The problem I run into is when I try parse to a struct with an optional property e.g. in my example I changed 'let name: String' to 'let name: String?' Here is Chris Eidhof's code, with the small modification that I make the url property an optional (and removed extraneous code). I get the error: Function signature Int -> String? -> Blog is not compatible with the expected type Int -> (String -> Blog). This makes sense to me as I understand it's expecting 'String' and not 'String?'. The error appears in the parseBlog function import Foundation let parsedJSON : [String:AnyObject] = [ "stat": "ok", "blogs": [ "blog": [ [ "id" : 73, "name" : "Bloxus test", ], [ "id" : 74, "name" : "Manila Test", ] ] ] ] struct Blog { let id: Int let name: String? } func parseBlog(blog: AnyObject) -> Blog? { let mkBlog = curry {id, name in Blog(id: id, name: name) } return asDict(blog) >>>= { mkBlog <*> int($0, key: "id")
<*> string($0, key:"name") } } func parseJSON() { let blogs = dictionary(parsedJSON, key: "blogs") >>>= { array($0, key: "blog") >>>= {
join($0.map(parseBlog)) } } print("posts: \(blogs)") } func toURL(urlString: String) -> NSURL { return NSURL(string: urlString)! } func asDict(x: AnyObject) -> [String:AnyObject]? { return x as? [String:AnyObject] } func join<A>(elements: [A?]) -> [A]? { var result : [A] = [] for element in elements { if let x = element { result += [x] } else { return nil } } return result } infix operator <*> { associativity left precedence 150 } func <*><A, B>(l: (A -> B)?, r: A?) -> B? { if let l1 = l { if let r1 = r { return l1(r1) } } return nil } func flatten<A>(x: A??) -> A? { if let y = x { return y } return nil } func array(input: [String:AnyObject], key: String) -> [AnyObject]? { let maybeAny : AnyObject? = input[key] return maybeAny >>>= {$0 as? [AnyObject] }
}

func dictionary(input: [String:AnyObject], key: String) ->  [String:AnyObject]? {
return input[key] >>>= { $0 as? [String:AnyObject] } } func string(input: [String:AnyObject], key: String) -> String? { return input[key] >>>= {$0 as? String }
}

func number(input: [NSObject:AnyObject], key: String) -> NSNumber? {
return input[key] >>>= { $0 as? NSNumber } } func int(input: [NSObject:AnyObject], key: String) -> Int? { return number(input,key: key).map {$0.integerValue }
}

func bool(input: [NSObject:AnyObject], key: String) -> Bool? {
return number(input,key: key).map { $0.boolValue } } func curry<A,B,R>(f: (A,B) -> R) -> A -> B -> R { return { a in { b in f(a,b) } } } infix operator >>>= {} func >>>= <A,B> (optional : A?, f : A -> B?) -> B? { return flatten(optional.map(f)) }  I think the cause of it <*> operator, which is a functional call that only gets executed when all optional values are non-nil. Hence I created a new operator <?> which is similar to <*>, but allows for the input to be optional (I think) infix operator <?> { associativity left precedence 150 } func <?><A, B>(l: (A? -> B)?, r: A?) -> B? { if let l1 = l { return l1(r) } return nil }  If I then change the parseBlog function to the following, the error disappears. func parseBlog(blog: AnyObject) -> Blog? { let mkBlog = curry {id, name in Blog(id: id, name: name) } return asDict(blog) >>>= { mkBlog <*> int($0, key: "id")
<?> string($0, key:"name") } }  I am not quite sure why this works as I am still wrapping my head around some of these concepts. I have a vague understanding of why <?> works but can't quite put it into words. Any clarity would be really appreciated! ### DragonFly BSD Digest #### DragonFly 4.4 release candidate branched The next release of DragonFly is coming due, since it’s been 6 months. I just tagged 4.4RC, and I’ll have an image built soon. Current estimate is that we’ll have the 4.4-RELEASE at the end of the month. ### QuantOverflow #### Does GARCH derived variance explain the auto-correlation in a time series? Given a time series of$u_i$returns where i=1 to t.$\sigma_i$is calculated from GARCH(1,1) as$\sigma_i^2=w+\alpha u_{i-1}^2 +\beta \sigma_{i-1}^2$. What is the mathematical basis to say that$u_i^2/\sigma_i^2$will exhibit little auto-correlation in the series? Hull's book Options, Futures and Other Derivative is an excellent reference. In Hull 6th Ed p470, "How Good is the Model?" he states that "If a GARCH model is working well, it should remove the auto-correlation. We can test whether it has done so by considering the auto-correlation structure for the variables$u_i^2/\sigma_i^2$. If these show very little auto-correlation out model for$\sigma_i$has succeeded in explaining auto-correlation in the$u_i^2$". Max Log-Likelihood Estimation for variance ends with maximize $$-m \space ln(v) -\sum_{i=1}^{t} u_i^2/v_i$$ where$v_i$is variance=$\sigma_i^2$. This function does not really mean$u_i^2/v_i$being minimized, because$-ln(v_i)$gets larger and so does$u_i^2/v_i$as$v_i$gets smaller. However, it makes intuitive sense that dividing$u_t$return by its volatility(instant or regime) volatility, explains away volatility related component of the time series. I am looking for a mathematical or logical explanation of this. I think Hull is not very accurate here as the time series may have trends etc, also, there are better approaches to finding i.i.d from the times series than using$u_i^2/\sigma_i^2$alone. I particularly like Filtering Historical Simulation- Backtest Analysis by Barone-Adesi(2000) FHS ### TheoryOverflow #### Temporal and spatial locality when using linked list? Comment the probability of seeing good temporal and spatial locality when using singly link list. Explain. Spatial Locality: Referencing a particular resource is higher if a resource near it was just referenced. If the linked lists are used more like arrays, nodes allocated in consecutive locations and not too many insertions/deletion in the middle. For this reason, prefetching can fetch the next nodes as a side effect and caches still provide spatial locality. Temporal locality: A resource that is referenced at one point in time will be referenced again sometime in the near future. We can use loops to traverse through a singly linked list to search for specific elements. Each time we traverse through a linked list, the first node is always referenced. This is the only what I can come up with. Any suggestions? Is there a way that we might be able to improve spatial locality in singly linked lists? ### CompsciOverflow #### Computer won't boot up [on hold] Whenever I try to turn on my computer, it won't boot up into the bios (pressed f8 at start) or windows or anything like that. It's a black screen. I think some kind of virus messed around with my MBR or my hard drive. My dad managed to boot into Windows by inserting an Ubuntu CD (I don't know how that works) but ever since the last time it won't boot up again even with the CD. When I got onto Windows the last time, something changed my settings such as inverted mouse clicks without my permission so I just don't turn it on again. I just want to know if there a way to bring back everything to normal. I know it's weird but I described my situation as best as I can. ### StackOverflow #### Good experimental practice for comparing the performance of different classification algorithms for a specific task [on hold] Revising old exam papers and found this question: Discuss good experimental practice for comparing the performance of different classification algorithms for a specific task. As part of your answer, explain what training, testing and validation datasets are. Is the answer Learning Curves? Only thing I can think of. #### Distributed system facebook Does anyone know how to answer these questions or where I would find information on this? How does Facebook deal with Timing issues/synchronisation? How does Facebook deal with multiple architectures/file systems? How does Facebook deal with critical resources and the fair and equitable sharing of the same? Does Facebook behave as a real time system as well as being a distributed system and if so how does Facebook manage deadlines? Is Facebook a client-server or peer-to-peer system and if so how? How is data transferred so that there is no corruption of data on Facebook? #### Reproduce Fisher linear discriminant figure Many books illustrate the idea of Fisher linear discriminant analysis using the following figure (this particular is from Pattern Recognition and Machine Learning, p. 188) I wonder how to reproduce this figure in R (or in any other language). Pasted below is my initial effort in R. I simulate two groups of data and draw linear discriminant using abline() function. Any suggestions are welcome. set.seed(2014) library(MASS) library(DiscriMiner) # For scatter matrices # Simulate bivariate normal distribution with 2 classes mu1 <- c(2, -4) mu2 <- c(2, 6) rho <- 0.8 s1 <- 1 s2 <- 3 Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2) n <- 50 X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma) X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma) y <- rep(c(0, 1), each = n) X <- rbind(x1 = X1, x2 = X2) X <- scale(X) # Scatter matrices B <- betweenCov(variables = X, group = y) W <- withinCov(variables = X, group = y) # Eigenvectors ev <- eigen(solve(W) %*% B)$vectors
slope <- - ev[1,1] / ev[2,1]
intercept <- ev[2,1]

par(pty = "s")
plot(X, col = y + 1, pch = 16)
abline(a = slope, b = intercept, lwd = 2, lty = 2)


MY (UNFINISHED) WORK

I pasted my current solution below. The main question is how to rotate (and move) the density plot according to decision boundary. Any suggestions are still welcome.

require(ggplot2)
library(grid)
library(MASS)

# Simulation parameters
mu1 <- c(5, -9)
mu2 <- c(4, 9)
rho <- 0.5
s1 <- 1
s2 <- 3
Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2)
n <- 50
# Multivariate normal sampling
X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma)
X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma)
# Combine into data frame
y <- rep(c(0, 1), each = n)
X <- rbind(x1 = X1, x2 = X2)
X <- scale(X)
X <- data.frame(X, class = y)

# Apply lda()
m1 <- lda(class ~ X1 + X2, data = X)
m1.pred <- predict(m1)
# Compute intercept and slope for abline
gmean <- m1$prior %*% m1$means
const <- as.numeric(gmean %*% m1$scaling) z <- as.matrix(X[, 1:2]) %*% m1$scaling - const
slope <- - m1$scaling[1] / m1$scaling[2]
intercept <- const / m1$scaling[2] # Projected values LD <- data.frame(predict(m1)$x, class = y)

# Scatterplot
p1 <- ggplot(X, aes(X1, X2, color=as.factor(class))) +
geom_point() +
theme_bw() +
theme(legend.position = "none") +
scale_x_continuous(limits=c(-5, 5)) +
scale_y_continuous(limits=c(-5, 5)) +
geom_abline(intecept = intercept, slope = slope)

# Density plot
p2 <- ggplot(LD, aes(x = LD1)) +
geom_density(aes(fill = as.factor(class), y = ..scaled..)) +
theme_bw() +
theme(legend.position = "none")

grid.newpage()
print(p1)
vp <- viewport(width = .7, height = 0.6, x = 0.5, y = 0.3, just = c("centre"))
pushViewport(vp)
print(p2, vp = vp)


### TheoryOverflow

#### Complexity analysis on directed acyclic graphs [on hold]

Consider a directed acyclic graph (DAG) -- that is, a directed graph without any directed loops. An example of such a graph is shown below.

I assume that each node takes on binary values, $0$ or $1$. Now, let $x_i\in\{0,1\}^N$ ($N=6$ in the above DAG) be a valid state on a DAG if for any enabled node in the DAG, all of its predecessor nodes are also enabled. The complete list of such valid states for the above example DAG are shown below.

Let $\mathcal{X} = \{x_1,x_2,\ldots,x_{11}\}$. I wish to characterize the size of $\mathcal{X}$ as a function of the DAG. Specifically, for a given DAG, can we say anything about the order in which $|\mathcal{X}|$ grows as a function of some property of the DAG? My intuition tells me that it should grow on the order of $2^{\max_n \{\text{deg}^-(n)\}}$ where $\text{deg}^-(n)$ denotes the indegree of node $n$ but I can't seem to prove this. Does anyone have any ideas or suggestions for some relevant literature?

EDIT:

Thank you for the comments so far Kaveh and Ricky Demer. I've included a couple of examples of extreme cases in the figure below (also, I should mention that I'm only interested in connected graphs).

From the above figure, the graph on the left has $2^6+1 = 65$ valid states, whereas the one on the right only has $7$ valid states. What property of these graphs results in such a drastic difference in dimensionality of $\mathcal{X}$? Assigning a notion of "width" and "height" to the DAG -- the graph on the left has a large "width" and a small "height", whereas the graph on the right has a small "width" and large "height" -- it seems like the dimension grows on the order of a function of "width/height". Is this a correct way of thinking about this problem? Can this idea be generalized to any DAG?

### TheoryOverflow

#### Tree decomposition for DAGs

Tree decompositions and treewidth are a standard way to measure how close an undirected graph is to a tree. I am studying decompositions of directed acyclic graphs (DAGs), and have come to define them as follows:

Given a DAG $G = (V, E)$, letting $G'$ be the undirected graph obtained by forgetting about edge orientations, a tree decomposition of $G$ is a tree decomposition $T$ of $G'$ as a rooted, directed tree, such that for any bag $b \in T$ and any vertex $v$ of $V$ in $b$, if $v$ occurs in none of the children of $b$, then for any directed edge $(u, v) \in E$, we have $u \in b$. The width of $G$ under this definition is then, as usual, the minimum across all decompositions $T$ of the maximal cardinality of a bag in $T$ minus one.

My general question is: Is such a notion of tree decomposition of a DAG known?

I know that there are existing definitions of tree decompositions for directed graphs, such as D-width, DAG-width, and directed treewidth. However, I don't think they are related to this definition, because, according to all of them, DAGs have low treewidth. Indeed, these definitions consider acyclic graphs as "simple". By contrast my definition only applies to DAGs, and its extension to general directed graphs is not interesting: unless I'm wrong, it implies that all elements of each strongly connected component must co-occur in a bag.

Further, in my case, the width of a DAG may be more than that of the corresponding undirected graph. In fact, a tree decomposition of a DAG $G$ in this sense is a standard tree decomposition of the moral graph of $G$, with additional conditions enforcing it to have a certain "directed" shape. I think the treewidth of the DAG in this sense can still be arbitrarily larger than that of the moral graph, but I'm not sure how to characterize the DAGs that would have "bounded treewidth" in the sense I proposed.

Motivation. In my context, the DAG is a circuit, and I use the tree decomposition to reason about valuations. The property that I require is designed to ensure the following: when processing the tree decomposition bottom-up, when we reach a new bag and we see a new element, we can examine its valuations based on that of its children, which are known because the children must be in the bag: we do not need to guess the valuation (as we would have to if it depended on nodes we haven't seen yet). I suspect that there may also be a relationship to inference in graphical models, where message passing needs to be done in only one direction, but I was unable to find references.

### High Scalability

#### How Wistia Handles Millions of Requests Per Hour and Processes Rich Video Analytics

This is a guest repost from Christophe Limpalair of his interview with Max Schnur, Web Developer at  Wistia.

Wistia is video hosting for business. They offer video analytics like heatmaps, and they give you the ability to add calls to action, for example. I was really interested in learning how all the different components work and how they’re able to stream so much video content, so that’s what this episode focuses on.

## What does Wistia’s stack look like?

As you will see, Wistia is made up of different parts. Here are some of the technologies powering these different parts:

## What scale are you running at?

### QuantOverflow

#### How to price a path dependent exchange option using?

Assume you have two stocks $S$ and $P$ so that at initial time $t = 0$: $S_0 > P_0$.

You bought an option which pays off $S_T - P_T$ as long as $S_t > P_t$ through the time $0 < t < T$.

What would the price of such option be?

*I am looking for a non-arbitrage argument avoiding any specific distribution assumptions (log-normal, normal etc) if possible.

#### Jabbour-Kramin-Young ABMC Binomial Parameterization

The JKY ABMC Model (taken from Jabbour, et al. 2001) parameterizes the binomial model (in a risk-neutral world) such that,

$u = e^{r\Delta t} + e^{r\Delta t}\sqrt{e^{\sigma^2\Delta t} - 1}$

$d = e^{r\Delta t} - e^{r\Delta t}\sqrt{e^{\sigma^2\Delta t} - 1}$

JKY continue and say that this is equivalent to,

$u = 1 + \sigma\sqrt{\Delta t} + R\Delta t + \mathcal O(\Delta t^\frac{3}{2})$

$d = 1 - \sigma\sqrt{\Delta t} + R\Delta t + \mathcal O(\Delta t^\frac{3}{2})$

I'm having trouble seeing this rigourously. Specifically, I can find that the first term $e^{r\Delta t} = 1 + R\Delta t + \mathcal O(\Delta t^2)$ from the Taylor expansion of $e^x$, but I'm having troubling seeing how the second term contributes to the $\pm\sigma\sqrt{\Delta t}$ and how it leads to the restriction of the error to $\mathcal O(\Delta t^\frac{3}{2})$

Thank you

### TheoryOverflow

#### Geometric picture behind quantum expanders

A $(d,\lambda)$-quantum expander is a distribution $\nu$ over the unitary group $\mathcal{U}(d)$ with the property that: a) $|\mathrm{supp} \ \nu| =d$, b) $\Vert \mathbb{E}_{U \sim \nu} U \otimes U^{\dagger} - \mathbb{E}_{U \sim \mu_H} U \otimes U^{\dagger}\Vert_{\infty} \leq \lambda$, where $\mu_H$ is the Haar measure. If instead of distributions over unitaries we consider distributions over permutation matrices, it's not difficult to see that we recover the usual definition of a $d$-regular expander graph. For more background, see e.g.: Efficient Quantum Tensor Product Expanders and k-designs by Harrow and Low.

My question is - do quantum expanders admit any kind of geometric interpretation similar to classical expanders (where spectral gap $\sim$ isoperimetry/expansion of the underlying graph)? I don't define "geometric realization" formally, but conceptually, one could hope that purely spectral criterion can be translated to some geometric picture (which, in the classical case, is the source of mathematical richness enjoyed by expanders; mathematical structure of quantum expanders seem to be much more limited).

#### Submatrix of small rank

Let $G=(V,E)$ be a graph with adjacency matrix $M=(m_{ij};i,j \in V )$ over $\mathbb{F}_2$ and $k \in \mathbb{Z^+}$. How can we find in polynomial time a subset $A \subseteq V$ such that

1. The rank of the sub matrix $M[A, V\setminus A] \leq k$

2. $|A|\geq |V|/c$, $|V \setminus A|\geq |V|/c$, for some constant $c>1$.

where $M[A, V\setminus A]$ denote the submatrix $(m_{ij};i \in A, j \in V \setminus A)$.

Note: Assume that existence of such a subset $A$ is guaranteed. .

### Planet Emacsen

#### sachachua: 2015-11-23 Emacs News

• Org Mode
• Configuration
• Email
• Coding
• Others
• Dev news
• New packages on MELPA

Links from reddit.com/r/emacs, Hacker News, planet.emacsen.org, Youtube, the Emacs commit log, the changes to the Emacs NEWS file, and emacs-devel

Past Emacs News round-ups

The post 2015-11-23 Emacs News appeared first on sacha chua :: living an awesome life.

#### Hackfest OpenBSD presentations

Two OpenBSD developers gave presentations at this year's Hackfest security conference in Quebec. The videos of both are now online for your viewing pleasure:

• "Kernel W^X Improvements In OpenBSD" by Mike Larkin (mlarkin@) (slides)
• "Pledge: A New Security Technology in OpenBSD" by Theo de Raadt (deraadt@) (slides)
• ### StackOverflow

#### Performing K-fold Cross-Validation: Using Same Training Set vs. Separate Validation Set

I am using the Python scikit-learn framework to build a decision tree. I am currently splitting my training data into two separate sets, one for training and the other for validation (implemented via K-fold cross-validation).

To cross-validate my model, should I split my data into two sets as outlined above or simply use the full training set? My main objective is to prevent overfitting. I have seen conflicting answers online about the use and efficacy of both these approaches.

I understand that K-fold cross-validation is commonly used when there is not enough data for a separate validation set. I do not have this limitation. Intuitively speaking, I believe that employing K-fold cross-validation in conjunction with a separate dataset will further reduce overfitting.

Is my supposition correct? Is there a better approach I can use to validate my model?

Split Dataset Approach:

x_train, x_test, y_train, y_test = train_test_split(df[features], df["SeriousDlqin2yrs"], test_size=0.2, random_state=13)

dt = DecisionTreeClassifier(min_samples_split=20, random_state=99)
dt.fit(x_train, y_train)

scores = cross_val_score(dt, x_test, y_test, cv=10)


Training Dataset Approach:

x_train=df[features]
y_train=df["SeriousDlqin2yrs"]

dt = DecisionTreeClassifier(min_samples_split=20, random_state=99)
dt.fit(x_train, y_train)

scores = cross_val_score(dt, x_train, y_train, cv=10)


### TheoryOverflow

#### Evidence that $\mathsf{P/poly}$ is more powerful than $\mathsf P$ [on hold]

What evidence is there that suggests $\mathsf{P\neq P/poly}$?

What are the consequences if $\mathsf{P=P/poly}$ holds?

### Daniel Lemire

#### Is peer review slowing down science and technology?

Ten years ago, a team lead by Irina Conboy at the University of California at Berkeley showed something remarkable in a Nature paper: if you take old cells and put them in a young environment, you effectively rejuvenate them. This is remarkable work that was cited hundreds of times.

Their work shows that vampire stories have a grain of truth in them. It seems that old people could be made young again by using the blood of the young. But unlike vampire stories, this is serious science.

So whatever happened to this work? It was cited and it lead to further academic research… There were a few press releases over the years…

But, on the whole, not much happened. Why?

One explanation could be that the findings were bogus. Yet they appear to be remarkably robust.

The theory behind the effect also appears reasonable. Our bodies are made of cells, and these cells are constantly being reconstructed and replenished. As you age, this process slows down.

Some scientists believe that the process slows down to protect us from further harm. It is like driving an old car: you do not want to push it too hard so you drive ever more slowly as the car gets older. Others (like Conboy I suspect) appear to believe that it is the slowing down of the repair itself that causes ill-health as we age.

But whatever your favorite theory is… what Conboy et al. showed is that you could re-activate the repair mechanisms by fooling the cells into thinking that they are in a young body. At the very least, this should lead to an increased metabolism… with the worst case scenario being a much higher rate of cancer and related diseases… and the best case being a reversal of aging.

We have some elegant proof of principles, like the fact that oxytocin appears to rejuvenate old muscles so that they become seemingly indistinguishable from young muscles. (You can order oxytocin on Amazon.com.)

So why did we not see much progress in the last ten years? Conboy et al. have produced their own answer regarding this lack of practical progress:

If all this has been known for 10 years, why is there still no therapeutics?

One reason is that instead of reporting broad rejuvenation of aging in three germ layer derivatives, muscle, liver, and brain by the systemic milieu, the impact of the study published in 2005 became narrower. The review and editorial process forced the removal of the neurogenesis data from the original manuscript. Originally, some neurogenesis data were included in the manuscript but, while the findings were solid, it would require months to years to address the reviewer’s comments, and the brain data were removed from the 2005 paper as an editorial compromise. (…)

Another reason for the slow pace in developing therapies to broadly combat age-related tissue degenerative pathologies is that defined strategies (…) have been very difficult to publish in high impact journals; (…)

Your best strategy in such case might be to simply “give up” and focus on producing “uncontroversial” results. So there are research projects that neither I nor many other researchers will touch…

I was reminded of what a great computer scientist, Edsger Dijkstra, wrote on this topic:

Not only does the mechanism of peer review fail to protect us from disasters, in a certain way it guarantees mediocrity (…) At the time, it is done, truly original work—which, in the scientific establishment, is as welcome as unwanted baby (…)

Dijkstra was a prototypical blogger: he wrote papers that he shared with his friends. Why can’t Conboy et al. do the same thing and “become independent” of peer review? Because they fear that people would dismiss their work as being “fringe” research with no credibility. They would not be funded. Without funding, they would quickly lose their laboratory, and so forth.

In any case, the Conboy et al. story reminds us that seemingly innocent cultural games, like peer review, can have a deep impact on what gets researched and how much progress we make over time. Ultimately, we have to allocate finite resources, if only the time of our trained researchers. How we do it matters very much.

Thankfully, since Conboy et al. published their 2005, the world of academic publishing has changed. Of course, the underlying culture can only change so much, people are still tailoring their work so that it will get accepted in prestigious venues… even if it makes said work much less important and interesting… But I also think that the culture is being transformed. Initiatives like the Public Library of Science (PLoS) launched in 2003 have showed the world that you could produce high impact serious work without going through an elitist venue.

I think that, ultimately, it is the spirit of open source that is gaining ground. That’s where the true meaning of science thrived: it does not matter who you are, what matters is whether you are proposing works. Good science is good science no matter what the publishing venue is… And there is more to science than publishing papers… Increasingly, researchers share their data and software… instead of trying to improve your impact through prestige, you can improve your impact by making life easier for people who want to use your work.

The evolution of how we research may end up accelerating research itself…

### StackOverflow

#### How do you visualize a tree in ID3 using weka?

I want to make a decision tree using weka in the format of ID3, when I do this, it is unable to be chosen. I can't select the option to view the decision tree.

### QuantOverflow

#### Computation of option vega under CEV

It is easy to define the option vega $\nu=\frac{\partial C}{\partial \sigma}$ under Black Scholes model since volatility is a single quantity.

However, under CEV or local volaility model, it is confusing for me to compute option vega.

For example, the volaility function is defined as $\sigma(S)=\delta S^{\beta}$. Then, how to compute a sensitivity of the option price with respect to $\sigma(S)$??

At first galance, I think $$\nu = \frac{\partial C}{\partial \sigma(S)}= \frac{\partial C}{\partial S}\frac{\partial S}{\partial \sigma(S)}=\Delta\frac{1}{\delta\beta S^{\beta-1}}$$

Is this right? But it seems wrong..

### TheoryOverflow

#### Finding the number of independent rows of a matrix

There is a $n\times n$ matrix $A$, and we are asked to find the number $N(A)$ of independent rows in it, i.e. rows that are not a linear combination of the other rows. Clearly, if $rank(A)=n$, then $N(A)=n$, but for $rank(A)=n-1$ $N(A)$ can be anywhere between $0$ and $n-1$.

A straightforward way to check if a row is independent is to check if removing it from $A$ lowers the rank of $A$. Assuming calculating rank requires $O(n^\alpha)$ operations, calculating $N(A)$ this way would require $O(n^{\alpha+1})$. Are there more efficient ways to find $N(A)$?

### CompsciOverflow

#### What is the difference between superposition and paramodulation?

I am currently writing a paper about automated theorem proving in first-order logic. Equality is not uncommon for mathematical problems and almost every theorem prover like VAMPIRE or SPASS has a calculus for equality. But the most paper are always writing about the term "superposition" calculus. A simple google search did not help to find any information about this term, only the wikipedia website which means "it can be used for first-order logic with equality".

Another paper referneced to the Paramodulation-based theorem proving which describes the concept of paramodulation for theorem proving. It seems that the superposition is some modified version of paramodulation, but I don't understand why and in which way.

So, is there any explanation of this calculus or can someone give me some hints what is different from paramodulation?

### StackOverflow

#### What is a mapping between natural numbers and valid simply typed lambda calculus terms?

Is there any efficient algorithm that maps between well-typed, closed terms of the simply typed lambda calculus and natural numbers? For example, using bruijn indexes (and probably on incorrect order):

0 → (λ 0)
1 → (λ (λ (0 1)))
2 → (λ (λ (1 0)))
3 → (λ 0 (λ 0))
4 → (λ (λ 0) 0)
5 → (λ (λ 1) 0)
6 → ... so on


Related questions: is there an algorithm that maps between natural numbers and normalized terms of the simply typed lambda calculus? Also, the same questions applied to the untyped lambda calculus.

### Fred Wilson

#### Quizlet

There are some investments that take years to make. They are often our best investments. Quizlet took something like five years to go from a company we got interested in to a USV investment.

In March 2009, we hosted an event we called Hacking Education. That was the official start of our focus on education. From that event came a thesis on how we would approach investing in education. We would invest in lightweight services and networks that allowed anyone to learn anything. We would not invest in services sold top down to the existing K-12 and higher education system. We wanted to obliterate, not automate.

We started hunting around for services and networks that fit our thesis. One that caught our attention was Quizlet, the leading web and mobile studying tool. We got an intro through Christina. Eventually Andy got a meeting. We found out that Quizlet had been bootstrapped, was profitable, and was not interested in raising outside capital. But Andy did not take no for an answer. He kept calling on them. He brought me to meet the two Quizlet leaders, Andrew and Dave, in September 2012. We got the same story in that meeting but we did make an impression. We started inviting them to our events in SF and they usually would come. So we kept doing that and kept stoping by to say hi when we were in SF.

Earlier this year Dave called me to say that they were going to raise outside capital. He and Andrew had concluded that the opportunity to build and develop peer to peer learning and studying tools for web and mobile was so large that they could not continue to bootstrap. So we jumped onto the opportunity and threw ourselves at it. That process had a number of fits and starts but we hung in there and eventually the financing came together the way Andrew and Dave wanted it to and we joined our friends at Costanoa, Altos, and Owl in a $12mm Series A round for a ten year old company. Just writing those last few words makes me happy. You don’t see many Series A rounds for ten year old companies. But when you do, they are generally good ones to do. So what is Quizlet? Well if you have kids in middle, high school, or college, they probably use it. Quizlet is a studying/learning tool written by Andrew Sutherland for his own use ten years ago when he was studying for a french test. He put it out on the web a bit later. He was joined by Dave Margulius who helped him turn Quizlet into a business by implementing an elegant freemium business model. Quizlet is free for anyone to use. But if you want to do certain higher value things, you can pay a small amount every month for access to them. Quizlet lets anyone create a study set and practice it online and on mobile. And it also allows anyone to use someone else’s study set. Quizlet is peer to peer learning. Over 100mm study sets have been created by users and over 1bn study sessions have been done on Quizlet. Quizlet has been a top ten education app in the mobile app stores for years, a fact I was constantly reminded of every time I went to look at the education category in the years we were chasing this investment. Here are some examples I just found by searching around: Just imagine a massively open database of 100mm study sets like that which is growing by the day. And you get why we have been and continue to be so interested in Quizlet. There are over 7bn learners on planet earth. Within a decade, the vast majority of them will have a mobile device connected to this massively open database of study set which is available for free. These 7bn learners will be able to contribute and consume these study sets. And in the process the world will become more educated and more literate. That is hacking education and that is why USV is so excited to, finally, be an investor in Quizlet. ### Planet Theory #### Star Trek Computing In the wake of Leonard Nimoy's death last February, I decided to rewatch the entire original Star Trek series, all 79 episodes. I had watched them each many times over in high school in the 70's, though the local station removed a scene or two from each episode to add commercial time and I often missed the opening segment because I didn't get home from school in time. Back in those stone ages we had no DVR or other method to record shows. I hadn't seen many episodes of the original series since high school. Now I can watch the entire episodes whenever I want in full and in order through the magic of Netflix. I finished this quest a few days ago. Some spoilers below. I could talk about the heavy sexism, the ability to predict future technologies (the flat screen TV in episode 74), the social issues in the 23rd century as viewed from the 60's, or just the lessons in leadership you can get from Kirk. Given the topic of this blog, let's talk about computing in Star Trek which they often just get so wrong, such as when Spock asks the computer to compute the last digit of π to force Jack-the-Ripper to remove his consciousness from the ship's computers. Too many episodes end with Kirk convincing a computer or robot to destroy itself. I'd like to see him try that with Siri. In one such episode "The Ultimate Computer", a new computer is installed in the Enterprise that replaces most of the crew. A conversation between Kirk and McCoy sounds familiar to many we have today (source). MCCOY: Did you see the love light in Spock's eyes? The right computer finally came along. What's the matter, Jim? KIRK: I think that thing is wrong, and I don't know why. MCCOY: I think it's wrong, too, replacing men with mindless machines. KIRK: I don't mean that. I'm getting a Red Alert right here. (the back of his head) That thing is dangerous. I feel. (hesitates) Only a fool would stand in the way of progress, if this is progress. You have my psychological profiles. Am I afraid of losing my job to that computer? MCCOY: Jim, we've all seen the advances of mechanisation. After all, Daystrom did design the computers that run this ship. KIRK: Under human control. MCCOY: We're all sorry for the other guy when he loses his job to a machine. When it comes to your job, that's different. And it always will be different. KIRK: Am I afraid of losing command to a computer? Daystrom's right. I can do a lot of other things. Am I afraid of losing the prestige and the power that goes with being a starship captain? Is that why I'm fighting it? Am I that petty? MCCOY: Jim, if you have the awareness to ask yourself that question, you don't need me to answer it for you. Why don't you ask James T. Kirk? He's a pretty honest guy. Later in the episode the computer starts behaving badly and Kirk has to convince it to shut itself down. But what if the computer just did its job? Is that our real future: Ships that travel to stars controlled only by machine. Or are we already there? ### Lobsters #### ButterflyNet – Fully async networking framework for Python 3.4 ### CompsciOverflow #### Coin Change Problem(Greedy Algorithm) In Coin Change Problem, If the ratio of Coin Value(Coin(i+1)/coin(i)) is always increasing then we can use Greedy Algorithm? Example- 1,3,4 are denominations of coin. If I want to pay Rs.6 then the smallest coin set would be (3,3). This solution set cannot be found by greedy Algorithm bcz it does not satisfy (4/3 >3/1). ### Fefe #### Ist hier jemand Dell-Kunde? Die shippen anscheinend ... Ist hier jemand Dell-Kunde? Die shippen anscheinend eine Backdoor-CA mit ihrem Windows. Aber, mal unter uns, wer sich irgendeinen PC kauft und nicht als erstes das Windows wegschmeißt und frisch neu installiert, dem ist eh nicht zu helfen. Daher war das ja so eine Frechheit, als Lenovo eine BIOS-Backdoor installierte, um auch bei einem frischen Reinstall wieder das Windows kompromittieren zu können. ### StackOverflow #### Use attribute and target matrices for TensorFlow Linear Regression Python I'm trying to follow this tutorial. TensorFlow just came out and I'm really trying to understand it. I'm familiar with penalized linear regression like Lasso, Ridge, and ElasticNet and its usage in scikit-learn. For scikit-learn Lasso regression, all I need to input into the regression algorithm is DF_X [an M x N dimensional attribute matrix (pd.DataFrame)] and SR_y [an M dimensional target vector (pd.Series)]. The Variable structure in TensorFlow is a bit new to me and I'm not sure how to structure my input data into what it wants. It seems as if softmax regression is for classification. How can I restructure my DF_X (M x N attribute matrix) and SR_y (M dimensional target vector) to input into tensorflow for linear regression? My current method for doing a Linear Regression uses pandas, numpy, and sklearn and it's shown below. I think this question will be really helpful for people getting familiar with TensorFlow: #!/usr/bin/python import pandas as pd import numpy as np import tensorflow as tf from sklearn.linear_model import LassoCV #Create DataFrames for attribute and target matrices DF_X = pd.DataFrame(np.array([[0,0,1],[2,3,1],[4,5,1],[3,4,1]]),columns=["att1","att2","att3"],index=["s1","s2","s3","s4"]) SR_y = pd.Series(np.array([3,2,5,8]),index=["s1","s2","s3","s4"],name="target") print DF_X #att1 att2 att3 #s1 0 0 1 #s2 2 3 1 #s3 4 5 1 #s4 3 4 1 print SR_y #s1 3 #s2 2 #s3 5 #s4 8 #Name: target, dtype: int64 #Create Linear Model (Lasso Regression) model = LassoCV() model.fit(DF_X,SR_y) print model #LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True, #max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, #precompute='auto', random_state=None, selection='cyclic', tol=0.0001, #verbose=False) print model.coef_ #[ 0. 0.3833346 0. ]  ### Dave Winer #### The problem is America, not Trump People are saying that the Trump campaign is turning Nazi. I'd like to offer another theory. It's turning American. We in America paint the past as a Norman Rockwell painting. White, suburban, not too rich, but not poor either. Everyone dresses well. Grandpa smokes a pipe and grandma makes great apple pie. The kids play musical instruments and baseball. But that is not our past. We brought Africans to America to be our slaves. They didn't need yellow stars because their skin color was enough of a label. We beat them, chained them, murdered them, all the things Nazis did to Jews, over a much longer period of time. We took our land from Native Americans and killed them too. We victimize people because of where they come from, how they dress, what books they read, the god they worship, for being too liberal or loving the wrong person. We have done some terrible things here. So you don't have to go to Germany for prior art. There's plenty of it right here in the U.S. of A. The problem isn't Trump. He's an opportunist. If people voted on issues, he would be a fountain of issues. But they don't. They vote for people who make them feel good and powerful and deserving of love. The problem isn't Trump, it's America. ### Lobsters #### The Class That Never Ends ### Dave Winer #### 20 vs 60 I listened to an interview on NPR this morning with violinist Itzhak Perlman. They asked if he knew more about the violin now, as he turns 70, than when he was 20. He said "no, but.." and paused. At this point my brain filled in the answer. "But I know myself much better!" Turns out that isn't what he said, but it's still an important idea that I'd like to pass on to my younger friends (I am 60). When you're 20 you don't even really see yourself. You and the world are the same thing. That's why young people feel there is such a thing as absolute right and wrong in all cases. The world seems simple. It's all about me! And anything that I don't like obviously is wrong, and anything I do like is equally obviously right. What happens as you grow older is that this sense of being everything can fade away, and as it does, other people and things become visible. You see that there are lots of different types of people, with different experiences, different ways of viewing the world. You can delight in this, and learn from it, and use it to further define yourself. At 60, I often laugh at myself: "Oh that's just something Dave does." That would have never occurred to me at 20. On the other hand, not to say there aren't wonderful things about being 20. Everything is so fresh and new, the world and time seem unlimited, and your abilities. Falling in love at any age is a miracle. And there are rewards that only come from knowledge and experience. PS: I'm also a better writer at 60. ### Lobsters #### Four Impossible Things Before Key Escrow #### Representing a Toroidal Grid ### Fefe #### Der Zentralrat der Juden spricht sich für Flüchtlings-Obergrenzen ... Der Zentralrat der Juden spricht sich für Flüchtlings-Obergrenzen aus. Aber nicht aus religiösen Gründen, weil das Moslems sind. Nein. "Wenn ich mir die Orte und Länder in Europa anschaue, in denen es die größten Probleme gibt, könnte man zu dem Schluss kommen, hier handele es sich nicht um ein religiöses Problem, sondern um ein ethnisches." Aus ethnischen Gründen. #### Benutzt hier jemand reddit?Die alte Privacy Policy, ... Benutzt hier jemand reddit? Die alte Privacy Policy, die neue Privacy Policy. In der alten steht "Your Private Information Is Never for Sale". In der neuen nicht. Dafür steht da was von Werbenetzwerken. #### Habt ihr euch mal gefragt, woher die CIA eigentlich ... Habt ihr euch mal gefragt, woher die CIA eigentlich ihre Tarn-Pässe für Deutschland kriegt, wenn unsere Pässe so krass unfälschbar sind? Na? Kommt ihr NIE drauf! Durch den NSA-Untersuchungsausschuss haben wir erfahren, dass die Hauptstelle für Befragungswesen (HBW), eine frühere Tarnbehörde des BND, auch Tarnpapiere ausgestellt hat. Nicht nur an Mitarbeiter des BND selbst, sondern ebenso an Mitglieder ausländischer Nachrichtendienste, darunter Briten und US-Amerikaner. Wer sich jetzt denkt: Hey, brilliante Geheimdienstarbeit, jetzt wissen wir, unter welchen Namen sich ausländische Spione bei uns aufhalten!1!!, für den habe ich eine schlechte Nachricht: In der Antwort auf eine Kleine Anfrage (OCR-Volltext unten) des Linken-Abgeordneten Andrej Hunko sehen wir, wie wenig Überblick die Bundesregierung über das Ausmaß der Vergabe falscher Papiere hat – oder haben will. Ja nee, wir gucken da nicht so genau hin! Das sind schließlich unsere Freude!1!! #### Die Briten erhöhen ihr Antiterror-Budget um 30%, kaufen ... Die Briten erhöhen ihr Antiterror-Budget um 30%, kaufen davon F-35. F-35, wir erinnern uns, war der hier. Nun, so ein F-35 kostet eine Menge Geld, wo sparen sie denn das ein? The announcement comes on the eve of expected massive spending cuts that could decimate public services. Angesichts Einsparungen in UK sei nochmal hierauf verwiesen. To pay for increased anti-terrorism spending, Osborne plans to slash social welfare spending in what the International Monetary Fund (IMF) is calling the most aggressive austerity plan amongst the world’s developed nations between now and 2020. The UK chancellor also refused to rule out introducing spending cuts to the police force. "Every public service has to make sure it is spending money well,” he told the BBC. Komisch, in meinem Weltbild erzeugt man mit F-35 Terrorismus und bakämpft ihn mit der Polizei. Der Cameron scheint das genau anders herum zu sehen. Wer sich jetzt einen fetten Wasserkopf bei der britischen Polizei vorstellt, den man mal wegsparen muss, der sei hieran erinnert. ### Lobsters #### Getting off the ground in Elm: project setup ### CompsciOverflow #### Removing unreachable states from$M$does not change$L(M)$What is the idea behind proving that: Removing unreachable states from a DFA$M$does not change$L(M)$. I don't know what cases I need consider. Consider any$w\in \sum^*$and then check for the cases$w \in L(M)$or$w \not \in L(M)$? Then I was thinking that there could some lemmas regarding DFA minimalization, sort of like "Assume DFA not minimal, i.e. contains unreachable states.", then "DFA can be minimalized", then "minimal DFA recognizes the same language". In checking$w \in L(M)$or$w \not \in L(M)$one should find out (as the answer below points out) that, because of the definition for$L(M)$and unreachable states,$w \in L(M)$does not depend on unreachable states (since$w \in L(M)$with reachable states). Therefore, for any$w \in L(M)$, it's recognised by both the DFA with and without unreachable states. Is this enough? #### Can a Turing Machine have infinite accept states? I'm still fairly new to Turing Machines, but I've been doing some research. I know that a Turing Machine can have an infinite tape and that it requires a finite number of states, but does it necessarily follow that a Turing Machine can have an infinite number of accept states? I keep seeing different layouts when formally defining Turing Machines, for example: M = (Q, Σ, Γ, τ, s, F). 1) F ⊆ Q is the set of final or accepting states. (plural) 2) F ⊆ Q is the accept state. (singular) So I'm just wondering which one is correct? Any help would be greatly appreciated. #### Can a deterministic language be accepted by a deterministic Push Down Automaton? I have a question that asks me to show that the PDA of the language L is not deterministic, but that the language is nevertheless deterministic. I was under the assumption that any deterministic language contains a PDA that is deterministic. The language in question is:$L = \{w \in \{a,b\}^* : n_a(w) = n_b(w)\}$### Lobsters #### Creating Node.js Command Line Utilities to Improve Your Workflow #### Go Proverbs #### Stable APIs for the Go language ### QuantOverflow #### Negative risk neutral probabilities economic argument We know of plenty ways to extract risk neutral distirbutions from option prices (for example Breeden Litzberger) but there is no real analysis on how to interpret negative state prices (Haug 2007 for example). State prices are Arrow Debreu securities, so it is the price an agent is willing to pay to get$1\ in a particular state and $0$ else.

Doing equilibirum and utility maximization we know that the agents smoothen their consumption by marginal rate of substitution which describes their risk-aversion.

Coming back to the negative probabilties: Could it not be possible there exist actual economic states where the agent is receiving money for the contract?

### CompsciOverflow

#### Why use coroutines instead of mutable objects?

What scenarios would cause coroutines to be a better (i.e. more robust, easier to modify) answer than simple mutable objects?

For example, in Fluent Python, the following running average example is used to show what coroutines can do:

def averager():
total = 0.0
count = 0
average = None
while True:
term = yield average
total += term
count += 1
average = total / count


Why not just use a class-based object?

class Averager:

def __init__(self,):
self.total = 0.0
self.count = 0
self.average = None

def send(self, term):
self.total += term
self.count += 1
self.average = self.total / self.count
return self.average


### StackOverflow

#### Machine learning prediction algorithm using spark

I am fairly new to machine learning.
I want to use a machine learning algorithm to predict user inputs across multiple fields.
I have datasets in the following format which I have collect from history/manipulated and will use for training the algorithm.

<user_age><location><user_gender><user_order>


I want to create a model where I can get predictions results if one or more fields are provided by the user.

suppose if only "user_age" is input, the model should return best prediction for the rest of the fields

Suppose the user has provided all the inputs except <user_order> then the model should return best prediction for order based on the three inputs.

I looked at some classification and clustering algorithms provided by spark mllib but most of them work with integer based inputs. Hence I haven't been able to map this use case.

Is there any specific class of algorithm or single algorithm which deals with this kind of use case.

Can someone suggest how this can be achieved ?

#### Interview: Renato Westphal (renato@)

Renato Westphal (renato@) recently agreed to answer some questions in the wake of committing eigrpd(8):

#### First off, thanks for agreeing to this interview. Tell us a little about yourself, your technical background, and how you came to the OpenBSD project.

My name is Renato Westphal and I was born and live in the south of Brazil. Currently I work for Taghos Tecnologia developing web cache and CDN solutions and in my previous job I worked for a small network equipment manufacturer, which is where my background in routing protocols comes from.

### StackOverflow

#### Why do I get "NameError: name '...' is not defined" in python module?

filename:recom.py

# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs,person1,person2):
# Get the list of shared_items
si={}
for item in prefs[person1]:
if item in prefs[person2]:
si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]])
return 1/(1+sum_of_squares)


Am getting the error ,when i try to do reload(recom)

Traceback (most recent call last): File "", line 1, in NameError: name 'recom' is not defined

### CompsciOverflow

#### Finding vertices for which there either exists a path to all other vertices or other vertices have a path to them

Or in other words, find all $v \in V$ such that there exists a path $\forall w \in V$ $v \rightarrow w$ or $w \rightarrow v$. This is for a directed acyclic graph. I need to find an $O(|E| + |V|)$ algorithm for this.

I can see how to identify if a given vertex meets these traits (perform a BFS starting at that vertex, then do another BFS on the reverse of that graph and see if every vertex was visited in those BFSes). The obvious solution would be to run this on every vertex of the graph, but that will end up being $O(|E||V| + |V|^{2})$.

I've considered identifying strongly connected components, but that doesn't seem like the right approach, since a SCC requires that $v$ and $w$ are mutually reachable, whereas this homework question requires that $v$ and $w$ are only reachable one way.

### CompsciOverflow

#### Interview questions: Temporal and spatial locality when using linked list?

Comment the probability of seeing good temporal and spatial locality when using singly link list. Explain.

Spatial Locality: Referencing a particular resource is higher if a resource near it was just referenced. If the linked lists are used more like arrays, nodes allocated in consecutive locations and not too many insertions/deletion in the middle. For this reason, prefetching can fetch the next nodes as a side effect and caches still provide spatial locality.

Temporal locality: A resource that is referenced at one point in time will be referenced again sometime in the near future. We can use loops to traverse through a singly linked list to search for specific elements. Each time we traverse through a linked list, the first node is always referenced.

This is the only what I can come up with.

Is there a way that we might be able to improve spatial locality in singly linked lists?

### StackOverflow

#### Stark a neuronal network project

after a long term of reading the theory behind neural networks I finally want to stark to do my own project in object recognition. However I struggle to find a practical entry point. I want to use either C#,C++ or C however all new tutorials seem to involve newer languages such as python.

For starting I would especially like to reprogram the theory concepts of Yann LeCuns publications about object recognition.

Which programming language is recommended to use? And much more important: Which framework do I use? There seem to be docents of frameworks (AForge, Apache Mahout, OpenCV) and my theoretical knowledge seems to be too impractical to differentiate the usage of these.

I want to program a simple independent neural network application which should be easy trainable plus I don't want to reprogram classes such as neuron or layer in order to focus on the architecture for the beginning.

Thanks and sorry for the simple probably often ask question, however I just couldn't find anything matching.

Greetings Nex

### StackOverflow

#### Difference between Hidden Markov models and Particle Filter (and Kalman Filter)

I would like to ask if someone knows the difference (if there is any difference) between Hidden Markov models (HMM) and Particle Filter (PF), and as a consequence Kalman Filter, or under which circumstances we use which algorithm. I’m a student and I have to do a project, but first I have to understand some things.

So, according to bibliography, both are State Space Models, including hidden (or latent or unobserved) states. According to Wikipedia (https://en.wikipedia.org/wiki/Hidden_Markov_model) “in HMM, the state space of the hidden variables is discrete, while the observations themselves can either be discrete (typically generated from a categorical distribution) or continuous (typically from a Gaussian distribution). Hidden Markov models can also be generalized to allow continuous state spaces. Examples of such models are those where the Markov process over hidden variables is a linear dynamical system, with a linear relationship among related variables and where all hidden and observed variables follow a Gaussian distribution. In simple cases, such as the linear dynamical system just mentioned, exact inference is tractable (in this case, using the Kalman filter); however, in general, exact inference in HMMs with continuous latent variables is infeasible, and approximate methods must be used, such as the extended Kalman filter or the particle filter.”

But for me this is a bit confusing… In simple words does this mean the follow (based also to more research that I have done):

• In HMM, the state space can be either discrete or continuous. Also the observations themselves can be either discrete or continuous. Also HMM is a linear and Gaussian or non-Gaussian dynamical system.
• In PF, the state space can be either discrete or continuous. Also the observations themselves can be either discrete or continuous. But PF is a non-linear (and non-Gaussian?) dynamical system (is that their difference?).
• Kalman filter (also looks like the same to me like HMM) is being used when we have linear and Gaussian dynamical system.

Also how do I know which algorithm to choose, because to me all these seem the same... Also I found a paper (not in English) which says that PF although can have linear data (for example raw data from a sensor-kinect which recognizes a movement), the dynamical system can be non-linear. Can this happen? Is this correct? How?

For gesture recognition, researchers can use either HMM or PF, but they don’t explain why they select each algorithm… Does anyone know how I can be helped to distinguish these algorithms, to understand their differences and how to choose the best algorithm?

I’m sorry if my question is too big, or some parts are naive but I didn’t find somewhere a convincing and scientific answer.

#### Azure Machine learning: error with multiclass classification algo

I have training set and test set (csv files with header), in which I have to classify each value. There is 118.000 uniq values in X column, and only about 13000 in y1 column, so there will be 13000 categories.

From Training set I need only X and y1 column to train model. I need to classify X value to one of categories (find normal from of initial word). I tried all multi algo but failed trying to evaluate model.

Visualizing Score model return this:

What can be a problem, it just returns -2 code as error and this log

UPD1: By Metadata Editor module under Project Column Module made column y1 as categorical, nothing seems to be changed

#### fmin_cg: Desired error not necessarily achieved due to precision loss

I have the following code to minimize the Cost Function with its gradient.

def trainLinearReg( X, y, lamda ):
# theta = zeros( shape(X)[1], 1 )
theta = random.rand( shape(X)[1], 1 ) # random initialization of theta

result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta,
args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
return result[1], result[0]


But I am having this warning:

Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 8403387632289934651424768.000000
Iterations: 0
Function evaluations: 15


My computeCost and computeGradient are defined as

def computeCost( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m     = shape(y)[0]
J     = 0

h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))

return J[0]

def computeGradient( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m     = shape(y)[0]
J     = 0

h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta



I have reviewed these similar questions:

scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”

scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'

scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?

I solve this problem based on @lejlot 's comments below. He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.

The previous wrong one:

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]


The correct one:

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]


where X_poly is actually used in the following traing as

cost, theta = trainLinearReg(X_poly, y, lamda)


### CompsciOverflow

#### Consequences of factoring and discrete log in $P/Poly$

What is the consequence of factoring and discrete log being in $P/poly$?

### TheoryOverflow

#### Is fiedler vector of Laplacian real?

Fiedler vector is the eigenvector of the Laplacian matrix corresponding to the second smallest eigenvalue (which is also known as algebraic connectivity). Laplacian for a undirected graph is defined as $D-A$ where $D$ is diagonal out-degree matrix and $A$ is the adjacency matrix. I am interested to know any results on the nature of the fielder vector, is it real?

### Lobsters

#### Mobile reddit site perf audit

“You’ll hate this one weird secret why m.reddit.com loads so slow…”

More seriously, there are some pretty graphs and a thorough walkthrough of debugging react performance in a real application.

(From July.)

### Wondermark

#### DIY Weekend: Kitchen spoon hooks

Last weekend, I made some spoon hangers for my kitchen!

Here is how I did this seemingly impossible task.

In our kitchen, we had a large ceramic crock to hold cooking spoons, tongs, and other large utensils. It was a mess!

Stuff was always spilling out, and whenever we’d get a new spatula or something, we’d have to shove it in there with everything else, and the stuff at the bottom would get all gross. OSHA even cited us for it, once.

When I noticed that every utensil (of course) has a hole in the handle, I decided to figure out how to hang the most commonly used pieces in a more accessible place, such as the side of a cabinet, which otherwise is just wasted space which is unacceptable.

We already do something similar with pots — we have a length of chain screwed along the wall, and we hang pots and pans from the links, using metal S-hooks. (The chain also comes in handy if our kitchen ever gets too icy to drive in safely.) So I figured there was probably a way to do something similar with utensils.

I Googled around to see how other people had solved this problem. The best tutorial I found recommended mounting bath towel bars, and hanging S-hooks from them. But the links in that article (to specific products at Ikea) were all expired, and those products weren’t offered anymore.

No matter. I don’t need to buy anything to make this happen!! We’ve fabricated all kinds of weird things at our new studio — it’s the type of place where, rather than buy a four-dollar toilet paper holder, I designed a version for my office roommate Jason to cut out of wood with his laser.

It took ten times the amount of work as screwing in a store-bought piece of plastic, but we made it, doggone it, and I point it out proudly to all our visitors.

As I was looking through some old scrap material, I realized I already had something that would work perfectly well for the kitchen: strips of wooden corner moulding.

I forgot to take a picture before I started, so this is a GENERIC IMAGE. This stuff is super cheap, though — a buck and a half per foot at Home Depot. Or, free, if someone leaves some behind after finishing some other job! As I think happened here!

I found a few scraps about a foot long each. I trimmed them to the size that would fit our cabinets, then measured out and drilled holes about an inch and a half apart.

The idea is that one side will be screwed into the cabinets, and the other will have a bunch of hooks dangling from it. The picture above shows them post-drilling, drying from a coat of stain. (Please don’t tell USPS I’m using a Priority Mail™ mailer for something other than mailing a Priority Mail™ shipment! The moisture-resistant Tyvek ensures any spilled stain doesn’t soak through to the tabletop.)

As I was drilling some of the holes, one piece of the wood threatened to split along its length. That wouldn’t do, so I cut some 1″ × 1.25″ pieces of 1/4″ MDF scrap and painted them brown.

Once the stain was dry, I glued the MDF pieces to the underside of each rail, flush with the side that would be mounted to the cabinet.

Since MDF is a composite material with no grain, and it’s really dense, this bracing should keep the wood from splitting, and also transfer into the wall any weight or downward pressure felt by the outer edges of the rail.

I mean, sure, these things are designed to hold incredibly lightweight items, but better safe than sorry — or as I like to say, better complicated than simple.

This is not always a good philosophy but it ends up kinda...happening a lot.

Jason’s work involves gluing stuff together all day every day, so in his bag of tricks I found these: backwards clothespins. They’re just regular clothespins in which the wooden pieces have been flipped upside down, and because this arrangement puts more tension on the spring, they hold super tightly. INCREDIBLE.

After drying the glue, a second coat of stain, and a coat of lacquer, they’re ready to hang! Luckily, our cabinet sides are solid plywood, so it was easy to screw directly into them.

The S-hooks came in a pack of 20 from Amazon. (OK, so I did have to buy something…but those convention-standard pipe-and-drape hooks would work just as well. So, start pocketing them next time you’re at Comic-Con.)

My search terms on Amazon were “the cheapest price for the largest package of entirely passable hooks which will be required to do very little work, and, if possible, can the item title be 37 words long?” Here is the result. They’re totally fine.

We mounted four of these rails on two different cabinet sides, and they work great! The bar for their performance was very low and they cleared it easily!!

Since the S-hooks are kind of wide, the spoons hang out from the wall a couple inches, in open air. This meant we could actually overlap things — as you can see in the above picture, the lower rail is actually mounted on the wall behind the longest spoons. So it’s possible to save some vertical space!

I dub this a SIMPLE PROJECT that is PRETTY HANDY. I get immense satisfaction from creating simple solutions to small problems! If you do it yourself, I hope it works as well for you, and makes you feel this alive.

Friends, I’ve come to the end of this post, and I just now realized I forgot to follow DIY Article Best Practices, and make it a 20-page slideshow in order to optimize ad views. I hope you can find it in your hearts to forgive me.