Algebraic Statistics CC2010

From Macaulay2
Jump to: navigation, search

In the time before the workshop this group should discuss their goals and projects more precisely and narrow to a few key questions. There are general interest overlaps and thus all are listed under this general topic.

For example, one subset of this group is likely to work on the GraphicalModels package. Different groups of researchers/coders have been working on different aspects of the newly titled GrapicalModels package. One main goal of this workshop is to complete the update on the discrete models, integrate that work with the work done for normal distributions, and further the completion of a single useful package.

  • Question: GraphicalModels is currently for discrete models, right? What is the phrase "work done for normal distributions" referring to? (Sonja, 21.7.2010.)
  • Answer: GraphicalModels works, as it is, in theory, for discrete models (I put these qualifiers here as it is not well tested yet and probably needs work -- by me, of course), but Luis and his student Alex have been doing lots of work for normal distributions (Gaussian models) and one big goal Luis and I have for this workshop is to get all of that into GraphicalModels and get the whole package working in a way that makes it ready for public consumption. (Amelia, 21.7.2010)
  • Great. I'd be happy to help if needed. (Sonja, 29.7.2010)
  • Question: Where can I find the lastest version of GraphicalModels.m2? I'm looking at its predecessor, Markov.m2. Personally, I would like to have a function that gives the parametrization of a graphical model, because currently Markov.m2 only gives its Markov ideal. (Shaowei, 2.8.2010)
  • Note: the Betti numbers give (partial) information about Markov basis for a hierarchical model. What Eduardo and the "monomial ideals" group will do needs to be linked to AlgStats. (Sonja, 7.8.2010)

day 1

  • Amelia's update on current state of affairs: there is already a "Graphs.m2" package. Think about whether functions need to be in Graphs.m2 or GraphicalModels.m2.
  • What is GraphicalModels.m2? it is really just Luis' Markov.m2 updated to be compatible with Graphs.m2.
  • TO DO: go through GraphicalModels and figure out what works and what doesn't; if there are problems, most likely has to do with compatibility with Graphs.
  • Ultimately, have one GraphicalModels package, including a folder with different file for discrete, Gaussian ,etc.
  • Graphs.m2: TO DO: make a Bigraph. Then, mixedgraph will have 3 hashtables (graph, digraph and bigraph overlayed).

day 2

  • Topological ordering and DFS implemented in Graphs.m2; this can be used if we want to relabel the nodes in a Graph or Digraph within GraphicalModels.m2. Currently lables can be anything, but we needed a way to relabel to integers consistently.
  • Luis removed the ordering issue from almost all code from GraphicalModels.m2
  • Documentation started/continued for Graphs.m2 [Amelia]

day 3

  • Documentation issues started to be resolved for GraphicalModels.m2. Sonja in charge of this.
  • A lot of work is being done on the Gaussian graphical models part. Currently it sits in SEM.m2 (structural equation models), but it will be moved to GraphicalModels.m2 at the end.
  • GraphicalModels has code to work on Gaussian Bayesian networks, but SEM.m2 extends this to work for general Gaussian graphical models which are models based on graphs with directed and bidirected edges. Bidirected edges in this context play a different role than undirected edges. This is work of Shaowei, Alex and Luis.
  • GraphicalModels.m2 continues its transition to be compliant with the new Graphs package. Luis in charge of this.

day 4

  • we were so busy working, i forgot to update all the documentation changes. [sonja]
  • Documentation for GraphicalModels.m2 continues. Also, we have started to change the main

functions so that the package can handle general input graphs with vertices labeled arbitrarily. [Luis]

  • The main algebraic functions for Gaussian graphical models have been implemented. That is, the creation of

the ring and ideal corresponding to a SEM and also the main function to check parameter identification [Shaowei and Alex]

  • The MixedGraph structure started being implemented in Graphs.m2 to serve as the input structure for SEM.m2 [Luis]

day 5

  • Alex and Shaowei: We have been working on mixed graphs and structural equation modeling (SEM). The main work is converting Luis' Singular "Gaussian.lib" code to M2 code. The code is stored in SEM.m2, and will be merged with GraphicalModels.m2 when everything is ready.
    • we added a type 'MixedGraph' and its constructor in Graphs.m2
    • we added a type 'Bigraph' in SEM.m2, because it is specific to SEMs. It is really just a special type of Graph.
    • we wrote a function "identify" which checks if an SEM is identifiable or d-identifiable.
    • we wrote a function "trekSeparation" which computes the list of 'maximal' trek separation conditional independence statements up to symmetry.
    • we wrote a function "trekIdeal" which converts the trek separation statements into polynomial equations.
    • we added helper functions in Graphs.m2.
      • "pathConnected(A,G)" which computes the vertices which can be reached from a vertex set A by directed paths within the directed graph G.
      • "adjacencyHashTable" which creates a hash table instead of a matrix for the adjacency matrix, since the vertices may not be labeled by numbers.
  • Shaowei: Some issues with Graphs.m2 -
    • LabeledGraph does not seem to be working in Graphs.m2 because it is declared to be of type Graph, but it really is a hashTable with two keys "graphData" and "labels". Corrected this.
    • SimpleGraph returns a Graph, but the edges do not repeat. I think it is better to return a Digraph. I took the liberty of correcting this, but I hope this does not conflict with any other packages.
  • Sonja: tired of documenting all these things. more then 1/2 of the methods are documented. Luis has been reorganizing the code. I added functionality and optional arguments for markovRing, etc.

After day 5

  • Luis has been working alongside Amelia to define a proper type MixedGraph. This change and other changes done to Graphs.m2 will have an effect on the code written on SEM.m2. It will be the work of Shaowei and Alex to make their code compliant with these changes.
  • Luis has finally finished the transition of Markov.m2 to GraphicalModels.m2 using the package Graphs.m2. This means that GraphicalModels.m2 has all the capabilities of Markov.M2 but it works in the general context of graphs with arbitrary labels. There is still some work to be done to complete the transition regarding the code for Gaussian Bayesian networks. But this will be done in a few stages since some of this code will be super seeded by the more general code contained in SEM.m2. Luis will be in charge of these changes until done.
  • There is still some work to do documenting the package and writing the tests. Sonja will be in charge of it. Luis was supposed to advance on this issue but was busy fixing markovIdeal, markovMatrices and all the internal functions used by these methods. Regarding these internal functions, Luis is a bit confused on why they are required to be defined globally. Luis tried to change then to be local but the code would not run. This seems like a bug but it is unclear what is the source or how to fix it.
  • Very important: We have tried to have as much backward compatibility among GraphicalModels.m2 and Markov.m2. But due to the new feature of accepting more general graphs with arbitrarily labeled vertices, we were forced to add as input the Digraph in the method markovIdeal. Further, some ambiguity issues have risen in this regard, since markovRing does not require the digraph as input, so it might not be a priori clear what does p_{i_1...i_n} means. Particularly if the user inputs the digraph vertices in an unordered way, e.g., a,d,b,c. Since the internal code will try to order the vertices, p_{ijkl} would mean p(a=i,b=j,c=k,d=l) despite the fact that the labels were entered in a different order. For this reason, the user is adviced to always use compatible labels, i.e., labels that M2 can compare and order. There were some alternative solutions but this seemed to be the better one.

Status of Graphs.m2 [15.8.2010]

  • Completed:
    • All types, Graph, Digraph, Bigraph, MixedGraph, LabeledGraph are constructed to have a cache table. Functions making it easy to get at the hashtable that is the actual graph are implemented. For example, once G is constructed, use "graph G" to get at the hash table that is the actual graph. I am hoping to change this to "hashTable G" (for any number of reasons) but M2 won't let me do this yet.
    • simpleGraph, edges, vertices, descendents, and children are all written to both use the new type construction AND caches the information so that it is only ever computed once. In fact, if you run descendents it will fill out the children at the same time, so once this function is run, children will have to be computed after that.
    • "net" is written for all types and looks as desired (this dictates what the graphs look like to the user). and toString is written for all types, but I'm not entirely happy with it yet.
  • Needs Work: This is a list of functions that need to be updated to use the new construction of the graph types. Note that for many functions this is simply a matter of adding in a line looking at "graph G", possibly also checking the cache and/or adding something into the cache. The functions above serve as good examples. Working with MixedGraphs is a bit cumbersome and I'm going to talk with Mike and/or Dan about possible ways to do this more efficiently.
    • All graph display functions someone besides me needs to test these!
    • Write functions to ease access to the parts of MixedGraph and then use this in other functions. Access to the Graph is problematic due to choices made because hashTable does not work as expected. See internal documentation.
    • nondescendents
    • parents
    • neighbors
    • nonneighbors
    • foreFathers
    • removeNodes,
    • inducedSubgraph,
    • completeGraph,
    • cycleGraph,
    • SortedDigraph --- should this be a new type or should this information be in the cache? If it is a new type, still, what should be in the cache and what not?
    • topSort,
    • DFS,
    • isCyclic,
    • adjacencyMatrix,
    • degreeMatrix,
    • laplacianMatrix,
    • incidenceMatrix,
    • floydWarshall
    • reachable
    • adjacencyHashTable
    • All of GraphicalModels.m2
    • All of SEM.m2
  • Tests and Documentation
    • Documentation: We still have A LOT to do with this. However, since many functions are simply written for many different types and this generates lots of nodes, there are good ways to do this efficiently. When I looked at Graphs.m2 it was not using SimpleDoc quite the way it was meant to be used, particularly in this regard. I have fixed many of the existing nodes to be good examples. In particular, mixedGraph is a good example of how to efficiently write documentation nodes. Please follow this model (we may or may not agree on what is said, I am only stating that the M2 syntax is a good model).
    • I made a few tests, but have many many more to make still.
  • Update on Graph.m2 (Shaowei, 13 Sep 2010)
    • Changes
      • used "Graph","Digraph","Bigraph" as keys in MixedGraph instead of "class g","class d","class b"
      • removed "adjacencyHashTable", because i rewrote my code for another function so that it is not used
      • renamed "pathConnected" as "reachable", corrected it to work with new cache structure
      • documentation for "reachable"
      • made Bigraph a new type of Digraph (just like Graph)
      • "graph(HashTable)" constructor: it now accounts for Singletons
      • "labeledGraph" constructor: corrected it to work with new cache structure
      • "adjacencyMatrix": corrected it to work with new cache structure
      • cosmetic changes to "completeGraph" and "cycleGraph"
      • added comments to code in the constructors
      • changed all "class g === Graph" to "instance(g,Graph)", because in future, we want the same code to work with subclasses of Graph!
    • Comments
      • it seems to me that rather than let Graph and Bigraph be subclasses of Digraph, we should have a generic GraphType, and Digraph, Graph and Bigraph should be subclasses of GraphType. This is because in applications, it is hard to check if the input graph is really a directed graph, or just a Graph or Bigraph that happens to be a Digraph. We could just add a check that the graph is not a Graph or Bigraph, but this becomes bad later when other users start creating subclasses of Digraph in their applications.
      • i think "edges(Graph)" should return list of sets {A,B}, not list of lists {A,B}, because we want {A,B}=={B,A}
      • i think there is a bug in "nonneighbors" (compare with "neighbors")
      • we need to correct the errors Dan sent us about local variables in Graphs.m2. i corrected the one on "G1" but i don't know what's wrong with the rest.
      • what does the following code in Graphs.m2 do?
  expression Graph := (I) -> new FunctionApplication from { graph,
  unsequence apply(toSequence edges I, expression) }
  net Ideal := (I) -> net expression I
  toString Ideal := toExternalString Ideal := (I) -> toString expression I
      • [dan] These two lines should be removed, because they affect only how ideals are printed, which is the responsibility of the core of Macaulay2:
  net Ideal := (I) -> net expression I
  toString Ideal := toExternalString Ideal := (I) -> toString expression I
  • [Shaowei 9/15] "edges(Bigraph)" now returns a list of sets, not a list of lists. And the stuff on "expression Graph" and "net Ideal" has been commented out.

Status of GraphicalModels.m2

  • [Shaowei 9/14] I was getting some weird behavior from the "getPositionOfKeys" function. It does not return the position in lexicographic order (i.e. 4 could come before 1), so I added "sort" to the keys. hope this does not affect anything else much.
      • [Sonja] This should be tested on the functions that call getPositionOfKeys. I think they we added the function and did *not* sort the keys because we let Macaulay2 keep whatever ordering it had assigned to the keys the first time the hashtable was created. Have you tried to run gaussMinors to see if the output changes compared to the old version of the function? (see my note below for 9/16).
      • [Sonja 9/17] One of the uses of getPositionOfKeys is inside gaussIdeal; there the code had to be modified to call getPositionOfKeys for "graph G", not "G". This is also the case for gaussMatrices; markovIdeal; markovMatrices. It could be the case for some other functions as well, but I have not discovered this yet.
  • [Shaowei 9/15] I changed "gaussRing" to work with the new cache structure for graphs, Also, it now only returns variables in the SYMMETRIC matrix, i.e. s_(i,j) for i<j where "<" is the lexicographic ordering on the graph nodes. I hope this change does not break anything in GraphicalModels. In general, we need to be careful that we are using the lexicographic ordering on the graph nodes in the other functions in GraphicalModels, especially when referring to the variables, and when referring to the covariance matrix.
  • [Shaowei 9/15] I have added SEM to GraphicalModels, so I will delete the SEM file. We still need documentation and tests for:
    • paramRing
    • covMatrix
    • diMatrix
    • biMatrix
    • identify
    • trekSeparation [node created by Sonja 9/17]
    • trekIdeal [node created by Sonja 9/17]
      • Note that none of these functions had a typical output value specified!! I updated this for each of them. [Sonja, 9/17]
  • [Sonja 9/16] Fixed pairMarkovStmts and localMarkovStmts to make them compatible with Graphs.m2. bayesBall is not working, thus globalMarkovStmts isn't either. In order to be able to succesfully install the package, I commented out the examples in the documentation that call for globalMarkovStmts. In addition, I had to take out the "sort" from the "getPositionOfKeys" function -- I do not understand why, Shaowei, but we can fix this over the next few days. If I kept "sort" there, I got errors that I do not understand.
  • [Sonja 9/17] Status of documentation nodes for GraphicalModels.m2:
    • revision needed for "GraphicalModels" - the main documentation node!
    • nonexistent prior to 9/17 (in addition to the above list posted by Shaowei):
      • gaussIdeal[node created by Sonja 9/17]
      • gaussMatrices[node created by Sonja 9/17]
      • markovIdeal [node created by Sonja 9/17]
      • markovMatrices [node created by Sonja 9/17]
      • VariableNameBigraph [node created by Sonja 9/17]
      • VariableNameDigraph [node created by Sonja 9/17]
      • VariableNameCovariance [node created by Sonja 9/17]
    • nodes with example code lines that are waiting to be uncommented until globalMarkovStmts is fixed:
      • GraphicalModels
      • gaussIdeal
      • globalMarkovStmts
    • unfinished (needs description only and a meaningful example; doc node already exists):
      • marginMap --TO DO: definition missing.
      • hideMap --TO DO: definition missing.
      • markovIdeal --TO DO: need to insert definition of this ideal inside description
      • markovMatrices --TO DO: need to insert precise definition of these matrices inside description
      • trekIdeal --doc node exists. TO DO: insert the definition and an example
      • trekSeparation --doc node exists. TO DO: insert the definition and an example

Return to the CC2010 projects page Return to the main CC2010 page