[prog] Converting definitions into 'on the fly' code...

Rasjid Wilcox rasjidw at openminddev.net
Fri May 7 00:10:45 EST 2004


On Thursday 06 May 2004 00:26, Conor Daly wrote:
> On Wed, May 05, 2004 at 10:11:45PM +1000 or so it is rumoured hereabouts,
> Rasjid Wilcox thought:
> > If this is what you mean, then my answer is categorically (particularly
> > given the desire to avoid large amounts of complexity) to use your
> > intepreted language of choice.  My choice (when I have one) is Python.
>
> It may be that this is the way to go.  The question there is how will this
> scale in the face of large amounts of data?  Can I write a Python module
> for the testing bit and call this from a compiled core program (yes, I
> probably can)?

Yes, you can embed Python into your C program, however, see
http://www.twistedmatrix.com/users/glyph/rant/extendit.html for a rant on why 
it may be better to *extend* Python with your C program, rather than embed 
Python into it.

In practice what this means is that you write some libraries in C (stuff that 
needs the speed of C or low-level hardware access etc) and then call these 
from your python program.

>  If I need to retrieve more data for a particular test,
> will I end up with a module so complex that I might as well write the
> whole thing in Python?

See above.

> > If this is what you mean, I can put together a quick sample app in
> > Python, since I've done something very similar already.  Let me know.
>
> Yes please!  I've been poking around with a C example based around a
> parsing tree as detailed in my first post but I'd love to see a few "how
> you do it in other languages" examples...
>
> Spec:
>
> Given this string:
>
> "a < mean ( b[1] : b[10] ) / 2"
>
> 1. Analyse the tokens 'a', 'b[]' and locate the data
> 2. Analyse the expression
> 3. Conduct the specified operation on the data
> 4. Return a result

Questions:
(a) Is the syntax already defined, or is that just a sample you made up along 
the lines what you had in mind?
(b) Any chance you could send me a little sample data (offlist, perhaps two or 
three tables with 10 rows per table)?  Or just make up some data if you like.
(c) At least one, preferably two actual formulas, and *precisely* what they 
mean.  In your above example, I don't know if you mean then mean of b[1] and 
b[10], or the mean of b[1], b[2], .... , b[9], b[10].  I'm assuming you mean 
the latter.
(d) What is the database backend being used?  This is mainly relevant since 
there is little point if the database concerned does not have any Python 
drivers.

> [0] I'm giving the database designer heart attacks suggesting there should
>     be a table that is an index to the tables.  eg.
>
> Stno | parameter | description | start_date |   end_date | table   | column
> -----+-----------+-------------+------------+------------+---------+-------
> 373  |    td     |   drybulb   | 1993-04-03 | 1995-12-31 | hours   | td
> 373  |    td     |   drybulb   | 1996-01-01 |            | minutes | td1
> 373  |    td     |   drybulb   | 1996-01-01 |            | minutes | td2
> 376  |    td     |   drybulb   | 1941-01-01 |            | days    | dblb
> -----+-----------+-------------+------------+------------+---------+-------
>
> "What!  You want to include a table in your data model?!!"

Hmm... I think I understand why you want to do it this way.

My guess:  You have a whole lot of studies that have been done (and new ones 
will be continued to be done in the future) that have *broadly* similar info, 
but by no means exactly the same.  Some studies may have extra data that 
other studies don't have.  Studies in the future may have extra data 
(columns) that have not even been thought of yet.  And there is little 
advantage in having all the data in a single table since different studies 
are not directly comparable (different locations, different time-frames etc).  
Having seperate tables for each study mean that queries on an individual 
study is fast(er), and with the exception of columns that you need to apply 
these 'common tests' to, you don't have to worry about mapping the different 
studies dataset into a 'standardised' format.  Is this a reasonably correct 
guess?

For what it is worth, the program that is my core development role at work has 
a 'temporary' table database where one table keeps in index of all names of 
the temporary tables, how long to keep them and what dataset they contain.  
It works for me.  I have no idea if this is frowned upon in the 'formal' 
database world - by training I'm a mathematician rather than a computer 
scientist (or whatever they call them nowdays).  I just like databases since 
it is just a modified version of set theory!  :-)

Actually, I can think of a way that you may not even need to have the 'index' 
table.

Suppose that Test X has paramaters 'a' (a single column) and b[] (an array of 
columns of varying length depending on the study).

User says: I want to run Text X, on table Y.

Program says: Select column that is paramater 'a'.
Program says: Select columns that are 'b[]'.

User hits 'GO'.

Program show records that fail the test.

Is this what you would like, or is it better to have the index table as you 
have given above, since there is less chance of user error?

Cheers,

Rasjid.

-- 
Rasjid Wilcox
Canberra, Australia (UTC +10 hrs)
http://www.openminddev.net



More information about the Programming mailing list