Content area
Formal grammars are the canonical means of describing a space of programs. The finite set of rules describing the space can also be used for sampling programs within this space. One can formulate this system as a reinforcement learning problem where one represents non-terminals as states and production rules as actions. The problem then becomes how to represent a partially completed program in an effective manner for such a model working to build programs. This thesis looks into sampling programs from various domain specific languages and constructing continuous embeddings of such programs to serve in downstream machine learning tasks, including for program expansion. Qualitative and quantitative analysis of doc2vec-based embeddings is done along with development of a quantitative metric for analyzing how effectively embeddings retain the structure of partial and complete programs, with comparisons to other text-based embedding systems.