Wednesday, June 10. 2009
Our paper on StringBorg is being published by Science of Computer programming:
M. Bravenboer, E. Dolstra, and E. Visser. Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach. Science of Computer Programming, 2009.
StringBorg is a technique for embedding 'string' languages in general purpose languages in a safe way, to avoid injection attacks.
The paradigmatic example is the embedding of SQL queries, which typically is done using string literals as in the following example:
String userName = getParam("userName");
String password = getParam("password");
String query = "SELECT id FROM users "
+ "WHERE name = ’" + userName + "’ "
+ "AND password = ’" + password + "’";
if (executeQuery(query).size() == 0)
throw new Exception("bad user/password");
In these approaches it is very easy to forget to escape SQL meta characters in the values obtained from the client. This opens the
door to an attack through a query that escapes from the programmed query.
StringBorg prevents such attacks by syntactically embedding the query language in the host language. For example, the query
above can then be written as follows:
SQL q = <| SELECT id FROM users
WHERE name = ${userName} AND password = ${password} |>;
if (executeQuery(q.toString()).size() == 0) ...
Now, the syntax of the query is checked statically. But more importantly, at run-time the query is constructed by a query
API that ensures that the query constructed has the same syntactic structure as the one defined by the programmer.
Furthermore, it enforces escaping meta-characters in values spliced into the query, thus guaranteeing that no injection
attacks can occur.
The paper does not just provide a solution for embedding SQL in Java, but offers a generic approach for embedding any
guest language in any host language with little more effort than providing syntax definitions for host and guest language.
Abstract: Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation, the concatenation of constants and client-supplied strings. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages (e.g. SQL) into that of the host language (e.g. Java) and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. This approach is generic, meaning that it can be applied with relative ease to any combination of context-free host and guest languages.
Saturday, May 30. 2009
For the participants of our hands-on tutorial on
Creating DSLs with Stratego/XT
at the
Code Generation 2009 conference,
we (i.e. Rob Vermaas) created a
VirtualBox
image with all the software needed during the tutorial.
In particular, it contains a full installation of
Stratego/XT
with compiler, libraries, and auxiliary packages such as java-front.
In addition, the image contains a built-from-source installation of
the WebDSL
language for building web applications, and an installation of tomcat
for deploying created web apps.
The image and instructions for its installation are available from
http://strategoxt.org/Stratego/CodeGeneration2009Tutorial
During the tutorial additional material will be handed out with concrete exercises.
The virtual machine is also useful for exploring Stratego/XT and WebDSL outside the context of this particular tutorial.
Saturday, May 23. 2009
Spoofax/IMP is a toolset for the creation of interactive development environments for custom languages based on domain-specific languages for editor services. The toolset is especially aimed at the developers of domain-specific languages, allowing them to provide IDE support for their specialist language under development. An important feature of Spoofax/IMP is the support for language composition, i.e. for languages consisting of multiple, syntactically different, sub-languages. Furthermore, the toolset allows the customization of heuristically generated editor services without loosing the ability to regenerate these services when a language evolves.
At LDTA 2009 we presented a paper about Spoofax/IMP. The final version of that paper is now finished, and a pre-print is available.
L. C. L. Kats, K. T. Kalleberg, and E. Visser. Domain-Specific Languages for Composable Editor Plugins.
In T. Ekman and J. Vinju, editors, Proceedings of the Ninth Workshop on Language Descriptions, Tools, and Applications (LDTA 2009),
Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, April 2009.
[ pdf]
Abstract:
Modern IDEs increase developer productivity by incorporating many
different kinds of editor services. These can be purely syntactic,
such as syntax highlighting, code folding, and an outline for
navigation; or they can be based on the language semantics, such as
in-line type error reporting and resolving identifier declarations.
Building all these services from scratch requires both the extensive
knowledge of the sometimes complicated and highly interdependent APIs
and extension mechanisms of an IDE framework, and an in-depth
understanding of the structure and semantics of the targeted language.
This paper describes Spoofax/IMP, a meta-tooling suite that provides
high-level domain-specific languages for describing editor services,
relieving editor developers from much of the framework-specific
programming. Editor services are defined as composable modules of
rules coupled to a modular SDF grammar. The composability provided by
the SGLR parser and the declaratively defined services allows embedded
languages and language extensions to be easily formulated as
additional rules extending an existing language definition. The
service definitions are used to generate Eclipse editor plugins. We
discuss two examples: an editor plugin for WebDSL, a domain-specific
language for web applications, and the embedding of WebDSL in
Stratego, used for expressing the (static) semantic rules of WebDSL.
Monday, May 11. 2009
We just got the notification that our submission to OOPSLA 2009 has been accepted.
The paper presents a solution to error recovery for the SGLR parsing algorithm. Here's the full citation and abstract (pre-print will follow later):
Lennart C. L. Kats, Maartje de Jonge, Emma Nilsson-Nyman, and Eelco Visser.
"Providing Rapid Feedback in Generated Modular Language Environments. Adding Error Recovery to Scannerless Generalized-LR Parsing"
In Gary T. Leavens, editor, Proceedings of the 24th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2009), New York, NY, USA, October 2009. ACM. (to appear).
Abstract: Integrated Development Environments (IDEs) increase programmer
productivity, providing rapid, interactive feedback based on the
syntax and semantics of a language. A heavy burden lies on developers
of new languages to provide adequate IDE support. Code generation
techniques provide a viable, efficient approach to semi-automatically
produce IDE plugins. Key components for the realization of plugins are
the language's grammar and parser. For embedded languages and
language extensions, constituent IDE plugin modules and their grammars
can be combined. Unlike conventional parsing algorithms, scannerless
generalized-LR parsing supports the full set of context-free grammars,
which is closed under composition, and hence can parse language
embeddings and extensions composed from separate grammar modules. To
apply this algorithm in an interactive environment, this paper
introduces a novel error recovery mechanism, which allows it to be
used with files with syntax errors -- common in interactive
editing. Error recovery is vital for providing rapid feedback in case
of syntax errors, as most IDE services depend on the parser -- from
syntax highlighting to semantic analysis and cross-referencing. We
base our approach on the principles of island grammars, and
automatically generate new productions for existing grammars, making
them more permissive of their inputs. To cope with the added
complexity of these grammars, we adapt the parser to support
backtracking. We evaluate the recovery quality and performance of our
approach using a set of composed languages, based on Java and
Stratego.
Saturday, May 2. 2009
I have been playing around for a couple of minutes with yUML, an online service by Tobin Harris for creating UML diagrams using a textual input language.
The diagram below is generated while you load this page.
The input needed to generate the diagram is the following list of relations:
[Publication]++->*[Author],
[AbstractAuthor]^[Author],
[AbstractAuthor]*<->1[Person],
[AbstractAuthor]*<->1[Affiliation],
[Person]*->*[Publication],
[Publication]^[PrintPublication],
[PrintPublication]^[Article],
[PrintPublication]^[InProceedings],
[Publication]^[PublishedVolume],
[PublishedVolume]^[Proceedings],
[InProceedings]<->[Proceedings],
[AbstractAuthor]^[Editor],
[PublishedVolume]++->*[Editor],
[Person]*->*[PublishedVolume],
[PublishedVolume]^[Book],
[PrintPublication]^[InCollection],
[Book]<->*[InCollection] .
This diagram documents a (small) subset of the data model underlying the researchr.org application for bibliography sharing and reviewing.
Thursday, April 30. 2009
I have been invited to give a talk at the
Sixth International Workshop on Web Information Systems Modeling (WISM 2009),
which will be held in June in Amsterdam (co-located with CAiSE 2009).
Here's the abstract I wrote for the talk.
Abstract:
In this talk I give an overview of the design and application of
WebDSL, a domain-specific language for data centric web applications.
WebDSL linguistically integrates the definition of data models, user
interfaces, actions, access control rules, data validation rules,
styling rules, and workflow definitions. While maintaining separation
between these concerns through specialized sub-languages, linguistic
integration ensures static consistency checking and correct code
generation. The language allows developers to concentrate on the
essential design of web applications, abstracting from accidental
complexity, such as the details of data persistence. The combination
of high-level and low-level constructs ensures high expressivity,
while supporting customization to application requirements. The
application of WebDSL is illustrated using the researchr.org
application for bibliography sharing and reviewing.
Links:
Monday, April 13. 2009
It was probably unavoidable. Tweetie app on iPod makes it easy. It seems I'm tweeting.
Blogging for the lazy. Let's see if I have more to say, or more frequently at least, on twitter than on this blog.
Monday, April 6. 2009
At the Code Generation 2009 conference,
Lennart Kats and I will give a hands-on tutorial
about building DSLs with Stratego/XT. The program says it thus:
Stratego/XT is a state-of-the art language and toolset for the development of domain-specific language implementations. In this session the participants learn to use Stratego/XT, by developing a generator for a small DSL for web applications generating PHP. The tutorial covers declarative syntax definition with SDF, code generation by model transformation, and model-to-model transformation by rewriting. The tutorial is based on a course in model-driven software development developed at Delft University of Technology.
NB Since this is a hands-on session places are strictly limited. Please let us know whether you plan to attend this session when you book your conference place. Places will be allocated on a first-come first-served basis.
|