Research
Information storage and management in networks: Suppose that information is encoded with a code of length $n$ over a finite alphabet $F$.
Coordinates of the codeword are stored at the vertices of a graph $G(V,E)$ with $|V|=n.$ This setting models distributed storage systems in which
servers are represented by the vertices, and connections of the graph represent communication constraints in the system. The problems considered here include protocols for the recovery of erased data at the vertices, bounding the communication complexity of recovery, data recovery in the presence of corrupted nodes, dynamic data maintenance in random environment, and other similar questions.
[more/less]
(a) If a server (vertex) becomes unavailable, the encoding supports recovery of the lost data, and we aim at constructing a code that allows recovery with low communication complexity. This problem can be studied from the perspective of
locally recoverable codes,
aiming at minimizing the number of vertices contacted for data restoration,
or of regenerating codes where we optimize the total number of symbols passed along the edges (each hop contributes a unit to the complexity of recovery).
These problems can be analyzed both on deterministic graphs and on random graphs from standard ensembles such as ${\mathcal G}_{n,p},$ whereby we aim at finding the threshold probability for vertex recovery. Interesting versions of these problems arise when some functional nodes become corrupted and provide incorrect information about their contents.
(b) Suppose now that the graph $G$ is infinite, for instance, $G={\mathbb Z},$ and we write bi-infinite sequences on the vertices with the
constraint that any $k$-tuple of consecutive symbols is a function of its $l$-neigborhood in the graph. The first question is to find the growth
rate of this set of sequences, and it can be addressed via a connection with constrained systems.
There are multiple possible extensions of this problem such as probabilistic relaxation of the recovery
constraint. The analysis relies on methods from constrained systems, symbolic dynamics, and entropy theory. These problems become even more interesting if instead of ${\mathbb Z}$ we store information in higher dimensions, e.g., $G={\mathbb Z}^2,$ linking this subject with Markov random fields.
(c) Another interesting class of problems, formulated here, starts with placing bits on the vertices of a finite graph (one bit per vertex) with the condition that every vertex satisfies a parity check that involves all of its neighbors. The question is to find the size of the largest code that satisfies this condition. It turns out that this question makes the most sense if the graph has no triangles. Surprisingly, for some families of triangle-free graphs it is possible to construct codes of rate asymptotically approaching one; see here and here.
Quantum codes, in particular, qubit codes and their transveral logical gates; permutation invariant codes for correcting deletions and amplitude damping noise; absorption/emission codes for protection against photon loss/gain noise.
Codes and uniform distributions: Here one is interested in characterizing binary codes and codes in other
finite metric spaces that approximate the uniform distribution on the space. Applications of such codes could include derandomizing algorithms,
approximation theory, probability of decoding error, image processing, and concept learning (uniform laws of large numbers and VC dimension).
These problems are also connected with constructing energy-minimizing configurations in metric spaces.
[more/less]
(a) Smoothing of codes: We say that a (radial) function $r:\{0,1\}^n\to {\mathbb R}_0^+$ is a noise kernel if it acts on functions on $\{0,1\}^n$ by convolution: $(r\ast f)(x)=\sum_{z\in \{0,1\}^n} r(z)f(x-z).$ In particular, if $f=2^{-n}{\bf 1}_{C}(x)$ is the normalized indicator function of the code, then upon the action of $r$, the code becomes more "uniformly distributed'' in terms of the distance to the uniform pmf $P(x)=2^{-n}, x\in\{0,1\}^n$.
One popular choice of the noise operator is the Bernoulli noise, defined as
$(T_{\beta_\delta}C)(x):=\frac 1{|C|} \sum_{y\in\{0,1\}^n} {\bf 1}_{C}(x+y)\delta^{|y|}(1-\delta)^{n-|y|}$
for $\delta\in(0,1).$ It is clear that a small-size code $C$ will resist smoothing unless $\delta$ is close to $1/2$, while a large-size code
will be more pliable to becoming uniform. Thus, there is a threshold rate below which asymptotically perfect smoothing is not possible, called smoothing capacity for a given noise level $\delta.$
Why would one want to make a code uniform? For instance, when we would like to make our transmissions indistinguishable from one another to an eavesdropper. This gives rise to interesting connections to thresholds for transmission over a wiretap channel [1],
[2].
This connection and many other related results for smoothing are discussed in detail in my recent paper, and they also connect the smoothing question to a more general problem in information theory known as channel resolvability.
The smoothing problem can be also stated for spherical codes as well as for lattices, and it has interesting connections with finding moments of the number of code points in the ball and list decoding
of codes and lattices.
(b) Another version of problems of this kind seeks to minimize quadratic (spherical) discrepancy of binary codes. Let $C_N=\{z_1,\dots,z_N\}\subset\{0,1\}^n$ be a binary code, and define
$$
D^{L_2}(C_N)=\sum_{t=0}^n \sum_{x\in {{\mathcal X}}}\Big(\frac 1N \sum_{j=1}^N {\bf 1}_{B(x,t)}(z_j)-\frac1{2^n}|B(x,t)|\Big)^2.
$$
Since the inner sum is proportional to the variance of the number of codewords in a ball with a random center, the discrepancy can be indeed viewed as a measure of code's uniformity. It turns out that for the binary case this quantity can be evaluated explicitly from the distance distribution of the code. Some properties and bounds for $D^{L_2}(C_N)$ that depend only on $n$ and $N$ are obtained here and here and they are also connected with a classical theorem known as
Stolarsky's invariance principle.
Other problems:
- Algebraic constructions of codes for storage
[1],
[2]
- Codes with few distances [1], [2]
- Private distribution estimation
[1]
Continual support of the US National Science Foundation is gratefully acknowledged.
|