[statnet_help] Basis for simulate and GOF
martina morris
morrism at uw.edu
Sun Aug 9 12:56:47 PDT 2020
Hi Adam,
> Some background: I’ve fitted a series of models of increasing complexity using ergm, and estimated the “mean
> connectivity” (the marginal of each of the edges in the network) estimated by each of them:
>
> model.i <- ergm(g ~ ...)
> samples <- lapply(simulate(model.i, nsim=500), FUN=as.matrix)
> mean_samp <- as.matrix(Reduce("+", samples) / length(samples))
it would actually help to see what your model terms are.
> I also conducted similar GOF comparison using the gof function. For some models, the mean of the samples (as
> well as the distribution of different statistics in the gof function) was suspiciously similar to the original
> network. Specifically, the models seemed to capture different “inhomogeneous”/“symmetry breaking" properties
> of the original network that they shouldn’t be able to, according to their terms/covariates. My current
> understanding is that this resulted from the default behaviour of “simulate”, which started from
> the original data matrix, and simply “didn’t get far enough”.
there are a couple of possible explanations for this:
1. one is how strongly the model constrains the tie distribution. in some
cases, these constraints can severely reduce the sample space of networks
defined by the model. in that case, it becomes very difficult to take a
"step" in the MCMC process, because the probability of a change (to the
sufficient stats in the model) is so low for most proposals.
2. the other is that once you "pin down" some key lower order properties
(like the degree distribution and mixing by nodal attributes) many of the
higher order graph properties (like component size and geodesic
distributions) are also often constrained. you may change the individual
node-id in a specific position, but the positions are fairly stable. we
see this alot in our infectious disease modeling simulations. there's a
good example in this paper in figure 2:
Krivitsky, P. N. and M. Morris (2017). "Inference for social network
models from egocentrically sampled data, with application to understanding
persistent racial disparities in hiv prevalence in the us." Annals of
Applied Statistics 11(1): 427-455. doi:10.1214/16-aoas1010
https://projecteuclid.org/euclid.aoas/1491616887
> I’d really appreciate your answers for the following two
questions:
> * Is there any quantitative way to measure that the samples generated by “simulate” (or “gof”) are
> “independent enough” for reliable estimation of single edge marginals or other statistics of interest?
have you looked at your MCMC diagnostics from the fits? i'd suggest
starting there. if there is visible serial correlation, that suggests #1
above, and you might want to increase your MCMC.interval. what you want
to see is a fuzzy caterpillar. if you already have that for all of your
models, that suggests the issue is #2 above.
> * Are there any obvious drawbacks to supplying a “naive" basis (like a Bernoulli graph with the same density)
> for simulate and/or gof?
just time. keep in mind the MCMC burnin time will need to be increased so
that you can reach the target statistics. but you can try this and see
whether, once you do reach the targets, you find the same lack of
variation in the higher order stats. if so, that again points to #2
above.
best,
mm
> Respectfully,
> Adam Haber
>
>
****************************************************************
Professor Emerita of Sociology and Statistics
Box 354322
University of Washington
Seattle, WA 98195-4322
Office: (206) 685-3402
Dept Office: (206) 543-5882, 543-7237
Fax: (206) 685-7419
morrism at u.washington.edu
http://faculty.washington.edu/morrism/
More information about the statnet_help
mailing list