On The Road ZJL

我的博客列表

2008年11月9日星期日

Fabulous Adventures In Coding : SimpleScript

Fabulous Adventures In Coding : SimpleScript


Tags: SimpleScript






SimpleScript Part Seven: Binder Skeleton
2004年5月4日 22:51




In Part Five I was discussing modules: there is a "global module" and any number of additional modules. Each module is associated with a named item, and the only module which is associated with more than one named item is the global module. This means that each module is going to need its own name table to keep track of the functions, the variable names and values, etc, in it. We'll call such devices "binders" because they bind names to values.



For reasons which will become more clear in future episodes, it is convenient to have the binder implement the IDispatch interface. Normally if you had to implement IDispatch you'd build a type library, and then have your implementation of IDispatch call through to the ITypeInfo methods which look up dispatch ids, invoke methods, and so on. There's a good reason for that: rolling your own IDispatch code can be extremely tricky. But, we're pretty much stuck with it here -- we haven't got the convenience of a known vtable interface to wrap a type library around at compile time. The script engine name tables are going to be extremely dynamic, so we're going to have to roll our own IDispatch.



The code isn't nearly done yet, obviously, but I've got a good skeleton in place. Take a look at binder.cpp for the details. I've pretty much got the high-level semantics of the dispatch interface worked out, but the actual low-level implementation details all return E_NOTIMPL. What I need to build here is a special hash table which can efficiently look up values by both numeric id and name. That's not too challenging; what is going to be more difficult is implementing the SCRIPTITEM_GLOBALMEMBERS feature. We need to be able to add another arbitrary dispatch object to our binder and dynamically aggregate all of its methods, without causing any collision in dispatch ids!



Default property semantics are a little weird. In the wacky world of late bound objects, an object may have a "default" method. This feature was created so that you could call the Item method of a collection the same way you dereference an array in Visual Basic -- you make it look like a function call. That is, since this code works:



Dim arr(10)arr(1) = 123



then so should this code



dict = CreateObject("Scripting.Dictionary")dict(1) = 123



the last line is the same as



dict.item(1) = 123



Such is the wackiness of VB -- because the array dereference and function call syntax are conflated, they ended up making function calls on invisible methods be legal on the left hand side of an assignment statement. The mind fairly boggles. We'll be good citizens in SimpleScript. If we have an object in our binder and someone attempts to call it, we'll just pass on the arguments to its default method and let it sort out what the right thing to do is.



As you can see, there is a whole lot of parameter validation, and altogether about eleven separate cases that I cover in the implementation of Invoke. The actual implementation in VBScript and JScript is much, much more complex than this because they support IDispatchEx, garbage collection, property accessor functions, and numerous other features that complicate the code. But still, this is the longest function you're likely to see me write in this project; I'll be very surprised if I write any code more complicated than the dispatch logic.



Speaking of garbage collection, I'm not quite sure what kind of garbage collector I'm going to build into SimpleScript. Building a mark-and-sweep collector like JScript has would be instructive, but it would also take a lot of time and effort. What do you guys think? (If you look at invoke.cpp you'll see that I'm thinking ahead about issues we're going to run into in garbage collection. We're going to run into related issues in the script execution code; it is possible for ill-behaved hosts to shut down the script engine in a callback while it is executing! We need to be robust in the face of such behaviour.)



The next step is to get the actual guts of the binder working, and then build a list of code blocks to be executed when the script engine goes to started state. The clone semantics for the code blocks are a little tricky, so we'll get that straightened away, and then maybe, just maybe, actually write a language parser.
UPDATE: For your convenience, I've put a zip file containing all the sources at http://www.eric.lippert.com/simplescript.zip. I'll keep this zip file updated as I change the sources on the blog.
SimpleScript Part Six: Threading Technicalities
2004年4月27日 8:56

Refresher Course



Before you read this, you might want to take a quick refresher on my original posting on the script engine threading model. That was a somewhat simplified (!) description of the actual script engine contract. Let me just sum up:




free threaded objects can be called from any thread at any time; the object is responsible for synchronizing access to shared resources
apartment threaded objects can have multiple instances on multiple threads but once an object is on a thread, it can only be called from that thread. The caller is responsible for always calling an object on the thread on which it was created. The object is responsible for synchronizing access to resources shared across instances.
rental threaded objects can be called on any thread, but the caller is responsible for ensuring that the object is only called from one thread at a time.
an initialized engine is apartment threaded, with a couple exceptions -- InterruptScriptThread and Clone can both be called from any thread on an initialized engine.
an uninitialized engine is free-threaded


Things Get More Complicated



That's actually not quite right. It would be more accurate to say that an uninitialized engine is rental threaded. Why? Because otherwise it would be legal to do really dumb things like call Close on two different threads at the same time. If you look carefully at the code, you'll see that most of the methods are not robust in the face of full-on multithreading. It's the "the host isn't a bozo" threading model!



This is a ridiculously complex threading contract, I know. From COM's point of view, the script engine is free threaded -- the restrictions are so arcane that clearly the engine has got to be the one enforcing the rules, not COM, which is why you'll see lots of calls to check that the caller is on the right thread in my code.


If you take a look at the registration code, you'll see that I register the script engine as "Both", which means "either free threaded or apartment threaded, we'll sort it out". What the heck is up with that? Why not just call it "free threaded" and be done with it?



Because again, I oversimplified the description of an apartment in my original posting. A single-threaded-apartment (STA) object is created on a thread and is always called on that thread. Think of a person (an object) in a room (a thread) -- you want to talk to them (call a method), you go to their room. You can put as many people in a room as you'd like, and build as many rooms as you'd like, but you want to call a method, you do it from the thread where the callee lives. That's all fine and good.



But there is also a multi-threaded apartment! Imagine that you take some of the rooms and you knock holes in the walls. If you're in one room and you want to talk to someone in another room, you don't have to go there, you can just yell through the hole. The guy listening to you is responsible for synchronizing all the shouting going on, but at least he knows that no one is going to be trying to talk through a wall with no hole in it.
In any given process there is one "main" STA, possibly many more secondary STAs, and one MTA. You can't have two distinct sets of rooms that mutually communicate but don't talk to each other.



I briefly described in an earlier post the way that COM marshals calls "through the walls" from one apartment to another. Well, if we register the script engine as a free threaded object, COM is going to think that it lives in the default MTA, and that any STA objects created by the process live in their own STAs, and therefore, all calls between the two apartments are going to have to be marshaled by the Free Threaded Marshaler. That extra indirection really screws up your performance numbers, lemme tell ya. We register as "both" threaded, and COM says "OK, you sort it out then if you're so smart", which we do by requiring that the host call us on the right thread and give us objects that are on the right thread.



Even that is still a considerable oversimplification -- there's still the Neutral Threaded Apartment that I haven't talked about yet, and the interactions between the CLR threading model and the COM threading model, and how marshaling really works, but that's getting out of my depth. Go ask Christopher Brumme if you've got questions about that stuff.



Honouring Our End Of The Script Engine Contract



Let's take a look at the script engine contract through the example of one of its objects that we've already implemented -- the named item list. There are only four operations that can be performed on the named item list, and we know when they can be performed according to our contract and implementation:




Add can only be called on an initialized engine, and hence only from the engine thread. Add writes to the named item list.
Reset can only be called on an initialized engine as it is being moved to uninitialized state, hence only from the engine thread. Reset writes to the named item list by removing non-persisting entries.
Clear is only called when the engine is going to closed state. The engine might already be in uninitialized state and hence, this can be called on any thread if the engine is uninitialized. However if the engine is initialized then it can only be called from the main engine thread. Clear writes to the named item list by removing all entries.
Clone can be called on any thread, and reads from the named item list.


What then are the possible threading conflicts? The three writers, Add, Reset and Clear cannot be called at the same time on different threads by virtue of the fact that the engine must be initialized if Add or Reset are being called. There are a few cases which for completeness we should get right, but are in reality extremely unlikely. Why would any host be so dumb as to Clone an engine while in the middle of a call to Add? Or worse, Clone during Close? I won't assume that hosts won't pull shens like that, even though they are very unlikely.



What about two Clones at the same time on two different threads? On the one hand, they're only reading, so why should they block? On the other hand, boy, do I ever not want to implement single-writer-multi-reader mutexes just to make that extremely unlikely case marginally faster.



Therefore, we'll do it the easy way. The only thing we really need to worry about practically is one thread doing a Clone while another thread is doing a Reset, but we'll get it right for all the cases. The first thing we'll do when we enter any of those methods is enter a critical section, and the last thing we'll do before we leave is exit it. Rather than mess around with the operating system's somewhat gross critical section code in the object itself, I define a handy object to wrap it. See mutex.cpp for the implementation.



Something to note about this implementation is that it uses InitializeCriticalSectionAndSpinCount to initialize the critical section. The comments there and here are required reading if you need to make critical sections work on heavily loaded pre-Windows-XP boxes. Earlier versions throw exceptions rather than returning error codes, which means that every entry and exit to a critical section has to be protected with __try blocks and you then have to get the exception handling right! The VBScript and JScript engines have all kinds of totally gross code in them to handle the edge case where a heavily loaded server runs out of memory just as a critical section is about to be entered. (Yes, it happens. Every single out-of-memory case will eventually be exercised by a sufficiently loaded server, I know this from painful experience.)
I'm going to skip all that totally gross code here and assume that we all live in the happy world of Windows XP, where the operating system actually returns sensible errors.



I'm still trying to sort out how all this is going to work once code blocks are throw into the mix. I'll try out a few things and see how it goes. More bulletins as events warrant.
SimpleScript Part Five: Named Items and Modules
2004年4月23日 5:29




Named Items



"Named items" are what we call the "top level" objects of the host provided object model. WScript in WSH, window in Internet Explorer, Response in ASP, are all named items. A host tells an initialized script engine about named items via the aptly named AddNamedItem method on the IActiveScript interface.



HRESULT ScriptEngine::AddNamedItem(const WCHAR * pszName, DWORD flags)



A few things should immediately seem a little weird about this interface.



The First Four Flags



First off, what are the flags? There are six. The first four are pretty straightforward:



If SCRIPTITEM_ISVISIBLE is set then the name of the named item is added to the global namespace.



Uh, OK… why would you ever NOT want this set? Didn't I just say that named items were specifically for injecting named object model roots into the engine? What good is a named item if you can't see its name?



That brings us to the second flag; if SCRIPTITEM_GLOBALMEMBERS is set then all (immediate) children of the named item are treated as though they are themselves top-level objects/methods. That's how in Internet Explorer you can say



window.alert("hello");



or



alert("hello");



and they do the same thing. IE tells the script engine that window is a visible named item with global members.



Now it makes a little more sense why you might want to have an invisible named item. What if you had an object with lots of methods and properties that you wanted available in the global namespace, but the object itself didn’t have a sensible name? I can't think of any script host offhand that does that, but the capability is there if you need it.



There's a second reason why you might want a named item to be invisible, but we'll get to that in a minute.



The third flag is SCRIPTITEM_ISSOURCE. If that's set then we know that this object is an event source. If the language supports implicit event binding and the host moves the engine into connected state, we're going to need to know which named items to hook up to which events. It can be very expensive to do that hookup, so this provides a simple optimization. If the host knows that a particular named item does not source events, it can choose to not mark the named item as a source, and we therefore never spend any time trying to hook up event sinks to it.



The fourth flag is SCRIPTITEM_ISPERSISTENT. Recall that I said a while back that when the engine goes back to uninitialized state after being initialized, we throw away "some" named items, where "some" was to be defined later. Now you know -- the engine remembers named items marked as persistent. Information about those named items is not thrown away until the engine is closed. Also, cloning an engine is basically making a copy of the uninitialized state of an engine, so persistent named items get cloned when their engine gets cloned. As we'll see, this fact has implications for our implementation.



Modules



I'm sure you have a general idea of what I mean by a "module", though if you Google define:module you'll see that everyone has a slightly different definition. Modules in the script engine sense are philosophically fairly straightforward. I often want to have some way to say "this collection of functions can play with each other, but are isolated from this other collection of functions". I want to be able to resolve name collisions by having two methods with the same name coexist in different modules.



Well, I want that stuff in languages designed for "programming in the large". When using script languages, more often than not you simply don't need to chunk stuff into modules. But, bizarrely enough, the script engines support modules, and in a pretty goofy way at that. Of course, any language implementor can implement module semantics however they want, but I see no reason to mess around with the de facto standards. Here's how modules work in VBScript and JScript:




There is a "global" module. Procedures defined in the global module are callable from any module. This is where the "built in" methods in VBScript and JScript go.
Every named item is associated with a unique module, with some exceptions:

Named items with global members are associated with the global module.
Named items marked with SCRIPTITEM_NOCODE are not associated with any module
All visible named items are added to the global namespace, except for named items marked as SCRIPTITEM_CODEONLY. Those are just names of modules and are associated with no object.
Important distinction: though all visible named items are added to the global module's namespace, procedures in the named items' modules are not visible from the global module.


Let me try to make this a little more concrete. Let's consider a hypothetical declarative language that supports embedded imperative script. You might want to have something like this:




没有评论: