[https://github.com/majidaldo/personal-compute-cloud]
Problem: Automate computing infrastructure setup
Solution: Docker hosts on CoreOS machines provisioned with Ansible.
I've recently finished coding up a solution to tackle 'personal' distributed computing. I was bothered by the (apparent) lack of a framework to handle the coordination of setting up multiple machines. And shell scripts are messy. Once I learned Ansible, I was not bothered! (It will be the only systems automation tool I will be using in the foreseeable future! yah..Ansible is AWESOME!)
Catering to the Scientific Computing Workflow: However, mere automation was not my only concern. I wanted a seamless transition from what I'm working on locally to being able to bring more computing power from remote machines. Unlike (pure) software engineering there isn't a 'development' environment and a 'production' environment. Now there are a handful of codes out there that can help you provision CoreOS clusters, but that does not fit well with the scientific computing workflow.
Status: Most of the functionality that I had planned has been implemented. However, like all codes, it's a work-in-progress. I'll be adding functionality as needed by my priorities.
Try it out.
Showing posts with label scientific computing. Show all posts
Showing posts with label scientific computing. Show all posts
Monday, August 10, 2015
Thursday, October 30, 2014
Proprietary Mathematical Programming at STRATA Hadoop World '14
I had previously commented on the lack of analytics-focused companies at STRATA. Now, to my surprise, two of the three big "M's" in proprietary computer algebra systems made a showing: Mathematica and MATLAB (the third being Maple). With the strength of the open-source software movement in "big data" and technical computing, I was wondering about what value they had to offer.
After having some discussions with them, the most important thing they offer is a consistent and tested user experience with high-quality documentation and technical support. This is something open source software needs to work on. I've alluded to this in a post most of two years ago and it is still applicable today and I don't see any major change in the user experience of open source software (hope it works on Windows, hope it compiles, hope the docs are good, hope I can email the author of the code, hope this package with this version number works with that package with at this version number, hope it runs fast...etc.) . I guess it's just a natural consequence of open source software; open source is flexible, cutting-edge, and cool but not necessarily user-friendly.
In a related note, on the hardware side, Cray, a brand name in scientific computing, is leveraging its high-performance computing experience for "big data". This is in contrast to computing on commodity hardware which is what Hadoop was intended for.
This is all in the general trend of technical computing (historically driven by scientific applications) merging with the "big data" world.
After having some discussions with them, the most important thing they offer is a consistent and tested user experience with high-quality documentation and technical support. This is something open source software needs to work on. I've alluded to this in a post most of two years ago and it is still applicable today and I don't see any major change in the user experience of open source software (hope it works on Windows, hope it compiles, hope the docs are good, hope I can email the author of the code, hope this package with this version number works with that package with at this version number, hope it runs fast...etc.) . I guess it's just a natural consequence of open source software; open source is flexible, cutting-edge, and cool but not necessarily user-friendly.
In a related note, on the hardware side, Cray, a brand name in scientific computing, is leveraging its high-performance computing experience for "big data". This is in contrast to computing on commodity hardware which is what Hadoop was intended for.
This is all in the general trend of technical computing (historically driven by scientific applications) merging with the "big data" world.
Friday, August 29, 2014
For data analysis teamwork: A trick to combine source control with data sharing
For analytical teamwork, we have, primarily, two things to manage to enable accountability and reproducibility: data and code. The fundamental question to answer is what (version of) source and data led to some results.
The problem of managing code is pretty much solved (or at least there are many people working on the problem). See this thing called git for example. As for managing data, it needs more work, but it's being worked on and I only know of dat that can provide some kind of versioning of data akin to git . Keep in mind some databases have snapshotting or timestamping capability. But in the view of dat, a database is just a storage backend since you would version with dat.
Suppose you're not worried about versioning data; perhaps you've got your own way of versioning data or that the data that you're working with is supposed fixed and its appropriate store is a filesystem. Now there are various systems to share the data files but the data would be living in a separate world from the code. Now it's possible to store data in a source control system but it would be generally an inefficient way to deal with the data especially if it's large.
Now wouldn't it be nice to also have source controlled text files such as metadata or documentation interleaved with the data files? To accomplish this, two things need to happen:
Let's see how to implement this with git for source control and BitTorrent Sync (BTSync) for file sharing. Your project directory structure will be like:
With this setup, your team achieves better synchronization of project files! Please let me know if I need to explain the system better.
The problem of managing code is pretty much solved (or at least there are many people working on the problem). See this thing called git for example. As for managing data, it needs more work, but it's being worked on and I only know of dat that can provide some kind of versioning of data akin to git . Keep in mind some databases have snapshotting or timestamping capability. But in the view of dat, a database is just a storage backend since you would version with dat.
Suppose you're not worried about versioning data; perhaps you've got your own way of versioning data or that the data that you're working with is supposed fixed and its appropriate store is a filesystem. Now there are various systems to share the data files but the data would be living in a separate world from the code. Now it's possible to store data in a source control system but it would be generally an inefficient way to deal with the data especially if it's large.
Now wouldn't it be nice to also have source controlled text files such as metadata or documentation interleaved with the data files? To accomplish this, two things need to happen:
- The data needs to be in the scope of the (source) version control control system but does NOT manage the data files
- and the file sharing system needs to NOT share the source controlled files.
Let's see how to implement this with git for source control and BitTorrent Sync (BTSync) for file sharing. Your project directory structure will be like:
- project_folder
- .gitignore (root) (git)
- src_dir1
- src_dir2
- data_store
- .SyncIgnore (BTSync, right pane)
- .gitignore (child) (git, left pane)
- data_dir
- readme.txt
- hugefile.dat
- data_dir2
Let's explain this setup by examining the *ignore files. By excluding certain files, the *ignore files achieve the two exclusions required.
- .gitignore (root) is typical and left unchanged.
- .SyncIgnore tells BTSync to not synchronize, for example, readme files that live in the data directory since they are source controlled (highlighted). (Not essential to the setup but it might be preferable to only sync "input" data files and not files that are generated)
- .gitignore (child) works with .SyncIgnore. We tell git not to manage files in the data directory as a general rule except the source controlled files (highlighted) that we specified in .SyncIgnore. We also need to tell git not to manage BTSync's special files and folders. (Not essential to the setup but we can take things further by source controlling .SyncIgnore)
I'm sure there are various ways of dealing with the *ignore files but what I've shown here should apply to many cases.
With this setup, your team achieves better synchronization of project files! Please let me know if I need to explain the system better.
Wednesday, July 2, 2014
How the mathematically-trained can increase their programming skill
One-sentence summary for the window shoppers: The mathematically-trained need to implement sophistication in their code to improve their programming skill (level).
Computer science people this post is not for you. Mathematical people that haven't developed programming skill, this post is for you. I'm aiming this post at people who want to get involved in some kind of mathematical computing: scientific computing, statistical computing, or data science where a crucial skill of the job is programming.
I was spurred to write this post by this tweet from Matt Davis backed by personal experience. I don't have formal training in computer science as many software engineering professionals do. Yet, I managed to be at least be functional in programming and conversant with software engineering practice. So I'd like to share my story in the hope that it can benefit others.
When I first started programming, all I cared about was the results that would come out of some mathematical formulation that I wanted to implement (and that's how I was assessed as well). These exercises have pretty much always followed the workflow diagrammed below:
You could get by implementing this workflow by writing quick and dirty code. While writing dirty code maybe appropriate for one-time tasks, you are actually not realizing your full potential if you keep doing this.
In my case, the pursuit of efficiency, flexibility, and just the 'cool' factor led me down the path of actually becoming a better programmer instead of just someone who wrote alot of little programs (using Python had much to do with increasing my skill but that's another story). I attribute the reasons for the increase of my skill to interest in the following:
Improving program logic:
So I haven't revealed anything new here but I hope putting this list together has some value. Also, efforts like Software Carpentry can put you on the fast-track towards improving your skill.
But as with every profession, you must (have the discipline to) practice.
Huge fraction of attendees at #pydata code for a living but not have CS degrees.
— Matt Davis (@jiffyclub) May 4, 2014
Computer science people this post is not for you. Mathematical people that haven't developed programming skill, this post is for you. I'm aiming this post at people who want to get involved in some kind of mathematical computing: scientific computing, statistical computing, or data science where a crucial skill of the job is programming.
I was spurred to write this post by this tweet from Matt Davis backed by personal experience. I don't have formal training in computer science as many software engineering professionals do. Yet, I managed to be at least be functional in programming and conversant with software engineering practice. So I'd like to share my story in the hope that it can benefit others.
When I first started programming, all I cared about was the results that would come out of some mathematical formulation that I wanted to implement (and that's how I was assessed as well). These exercises have pretty much always followed the workflow diagrammed below:
problem/question -> math -> code -> execute -> output -> analyze -> communicate results (feedback loops not shown)
In my case, the pursuit of efficiency, flexibility, and just the 'cool' factor led me down the path of actually becoming a better programmer instead of just someone who wrote alot of little programs (using Python had much to do with increasing my skill but that's another story). I attribute the reasons for the increase of my skill to interest in the following:
Improving program logic:
- Generalization which leads to abstraction of code. How do I make my code apply to more general cases?
- Code Expressiveness. How do I best represent the solution to my problem? What programming paradigm should I use: object-oriented? functional? This is related to generalization and abstraction; and is closely related to code readability and maintainability.
- Robustness. How does it handle failure? How does it handle corner cases? (eg. what happens if some function is passed an empty list?)
- Portability. So I got this code working on my Windows machine. Will it work on a mac? linux? a compute cluster? 64-bit operating system?
- Automation. Eliminate the need for human involvement in the process as much as possible.
- Modularity and separation of concerns. As your program gets bigger, you should notice that some units of logic have nothing to do with others. Put in the effort to separate your code into modules. This aspect is also related to code maintainability.
- High-performance. Can I make my code run faster? As a major concern for scientific computing and big data processing, you must understand some details of computer hardware, compilers, interpreters, parallel programming, operating systems, data structures, databases, and the differences between higher and lower-level programming languages. This understanding will be reflected in the code that you write.
- Do not repeat yourself (DRY). Sometimes, a shortcut to deliver a program is to duplicate some piece of information (because you didn't structure your program in the best way). Resist this temptation and have a central location for the value of some variable so that a change in this variable propagates appropriately throughout your system.
Improving productivity:
- Testing. As your program gets larger and more complex, you want to make sure its (modular!) components work as you develop your code. Test in an automated(!) fashion as well.
- Documentation. Expect that you'll come back to your code later to modify it. Save yourself, and others(!), some trouble down the road and document what all those functions do.
- Source Control. You need to be able to track versions of your code to help in debugging and accountability in teams. A keyword here is 'git'.
- Debugging. Stop using print statements. It may be ok for quick diagnosis but just "byte" the bullet and learn to use a debugger. You'll save yourself time in the long-run. Nonetheless, you can minimize your use of a debugger by writing modular and robust code from the start.
- Coding Environment. Integrated development environment vs text editor. VI vs Emacs vs Nano vs Notepad++ vs ...etc. Eclipse vs Visual Studio vs Spyder vs ...etc. Read up about these issues.
- Concentration. Don't underestimate the importance of sleep and uninterrupted blocks of time. I find that crafting quality code can be mentally taxing thereby requiring my focus. Also, having a healthy lifestyle in general is also relevant. I like to code while listening to chillout music with a moderate intake of a caffeinated drink.
At first I thought this entry was going to be a joke but on second thought it's really not, even though it's not a technical aspect of the work. See, I didn't boldface this point.
So I haven't revealed anything new here but I hope putting this list together has some value. Also, efforts like Software Carpentry can put you on the fast-track towards improving your skill.
But as with every profession, you must (have the discipline to) practice.
Tuesday, December 24, 2013
PyData NYC 2013.. Breeding Ground for Highly-Innovative Data Science Tools
I recently attended the PyData conference in New York City. I'd like to give my general impression (as opposed to showing you details you can get by visiting the links on the website). The impression I want to express is that of the culture at the conference: hacky..in a good way. That is, tools written by non-specialists to tackle problems facing their application domain where they are specialists in. This is partly due to the flexibility of Python and partly due to the highly-technical profile of the attendees of the conference.
This is just an impression so it's very subjective and I'm biased due to my enthusiasm for Python and my scientific computing background which Python has a firm grip on. But try comparing PyData to Strata which is a higher-level view of the data science world with R being the main analysis tool. The broader data science world is a collision of computer science and statistics both in theory and the tools used. While the narrower PyData world has its roots in the more harmonious scientific computing world where math meets computing albeit the math is generally less statistical.
Until data science curricula are developed, I believe the scientific computing background is a better foundation for data science than statistics alone or computer science on its own. Computational, data, and domain expertise are present in skilled scientific programmers, some of whom attended the conference. The caliber of talent at the conference was astounding. Attendees could, for example, talk about specific CPU and GPU architectures, database systems, compiled languages, distributed systems, GUIs; as well as talk about monte carlo techniques, machine learning, and optimization. Such broad knowledge of all of these areas is important for the implementation of a speedy scientific workflow which happens to be necessary for data science as well.
I'm also going to claim there are more tech-savvy scientists than there are tech-savvy statisticians. This isn't to diminish the importance of statisticians in data science but the computational pride of statisticians is in the comprehensive but slow and inelegant R language. Meanwhile scientific programmers know about, and have built eco-systems around, C/C++, Fortran, and Python and all the CS-subjects associated with these languages including parallel programming, compilation, and data structures. This is just the natural result of statisticians traditionally working with "small data" while scientific programmers often work with "big computation".
The mastery of these issues within the same community is what allows for innovative products such as scidb, bokeh, numba, and ipython and all the various small-scale hackery presented at the conference.
Perhaps I should hold off on making such claims until I go to the next Strata conference but this is ablog journal and I'm very late in writing about PyData!
This is just an impression so it's very subjective and I'm biased due to my enthusiasm for Python and my scientific computing background which Python has a firm grip on. But try comparing PyData to Strata which is a higher-level view of the data science world with R being the main analysis tool. The broader data science world is a collision of computer science and statistics both in theory and the tools used. While the narrower PyData world has its roots in the more harmonious scientific computing world where math meets computing albeit the math is generally less statistical.
Until data science curricula are developed, I believe the scientific computing background is a better foundation for data science than statistics alone or computer science on its own. Computational, data, and domain expertise are present in skilled scientific programmers, some of whom attended the conference. The caliber of talent at the conference was astounding. Attendees could, for example, talk about specific CPU and GPU architectures, database systems, compiled languages, distributed systems, GUIs; as well as talk about monte carlo techniques, machine learning, and optimization. Such broad knowledge of all of these areas is important for the implementation of a speedy scientific workflow which happens to be necessary for data science as well.
I'm also going to claim there are more tech-savvy scientists than there are tech-savvy statisticians. This isn't to diminish the importance of statisticians in data science but the computational pride of statisticians is in the comprehensive but slow and inelegant R language. Meanwhile scientific programmers know about, and have built eco-systems around, C/C++, Fortran, and Python and all the CS-subjects associated with these languages including parallel programming, compilation, and data structures. This is just the natural result of statisticians traditionally working with "small data" while scientific programmers often work with "big computation".
The mastery of these issues within the same community is what allows for innovative products such as scidb, bokeh, numba, and ipython and all the various small-scale hackery presented at the conference.
Perhaps I should hold off on making such claims until I go to the next Strata conference but this is a
Wednesday, March 20, 2013
C vs. Python in the Context of Reading a Simple Text Data File
Problem: Using C, read a Matrix Market file generically and versatilely.
Generically: using the same code (at run-time), interpret a line of data in various ways
I'm trying to do this in C. Coming from Python, C makes me feel CRIPPLED. Laundry list of offenses:
While I do think lower-level functions are essential for some purposes, this exercise makes me appreciate Python more (yes particularly Python and not other interpreted languages). In Python, I can read the lines in as a matrix of strings, then I could vectorize type conversion operations in a few lines of code.
So, I've resorted to code generation to create functions for all cases which solves half the problem. I won't bother trying to make things more generic. If it could be done, I expect that it's going to look messy.
4/17 Update: I've finished my reader. The "main()" file is here. I think I organized my procedure well but it took alot of code. It's ridiculous. I could accomplish the same with acceptable speed in about two dozen lines of Python code. I know this because I've written code to read data generically in Python several times.
Generically: using the same code (at run-time), interpret a line of data in various ways
I'm trying to do this in C. Coming from Python, C makes me feel CRIPPLED. Laundry list of offenses:
- Primitive string operations...oh actually they are just character arrays.
example: char teststring[10]="asdf"; (OK initialization) teststring="qwer"; (FAIL what I expect is a reassignment fails..turns out you must call string copy and make sure the destination has enough space.). - No negative indexing
- Can't get length of an array
- Can't handle types as objects. You can't say: if this object is of this type, then do this. Types are 'hardcoded'. You can't return a type.
- Case labels must be known at compile-time. The C compiler can't understand that if I use a function output with a known input at compile time to make a label that it could be known at compile-time. As a mathematically-inclined programmer, all I care about is functionality. I don't like to think about the preprocessor as a separate process.
- Your program can compile and work but in the realm of "undefined behavior". Good luck finding a bug caused by "undefined behavior".
While I do think lower-level functions are essential for some purposes, this exercise makes me appreciate Python more (yes particularly Python and not other interpreted languages). In Python, I can read the lines in as a matrix of strings, then I could vectorize type conversion operations in a few lines of code.
So, I've resorted to code generation to create functions for all cases which solves half the problem. I won't bother trying to make things more generic. If it could be done, I expect that it's going to look messy.
4/17 Update: I've finished my reader. The "main()" file is here. I think I organized my procedure well but it took alot of code. It's ridiculous. I could accomplish the same with acceptable speed in about two dozen lines of Python code. I know this because I've written code to read data generically in Python several times.
Thursday, February 14, 2013
Mathematica vs. Maple vs. SAGE/scipy UX
This post comes from significant experience with Maple and numpy/scipy/python.
I've been using Maple and numpy for a few years. I stopped upgrading my Maple at version 13 because improvements didn't compel me to pay for a newer version for my use. I chose Maple as a happy medium between the numerical and matrix-focused MATLAB and the symbolically-focused Mathematica. They can all do symbolics and matrix operations but they differ in how elegantly they're performed. But, as I gave Maple more complex mathematical tasks such as the calculus involved with Fourier transforms, and symbolic tensor & matrix operations, it became clear that it wasn't up to the task. I knew Mathematica was superior for those tasks but I chugged along with Maple until I got to George Mason University where they had a site license for Mathematica.
Right off the bat I could see a clear superiority in the user experience (mind that I didn't use Maple beyond ver 13). Vectors and matrices are just lists. In Maple there are objects called vectors, lists, tables, and matrices and you sometimes had to import them from a module. Clearly, in just this area, Mathematica is superior. Mathematica includes alot of functionality out of the box. In contrast, in Maple, I had to import modules for what I considered basic functionality which makes the experience not as consistent.
In comparison to other math environments:
For math-centric programming, Mathematica is a high bar to get to in terms of consistency, documentation, functionality, and general experience. My general gripe about open-source is the lack of an umbrella vision that moves a project in a certain direction. SAGE is the best (the only) hope but it has a long way to go. Even in the best scenario, getting different python packages seamlessly working together is doubtful (eg. look at my sym2num function).
Conclusion: If all you need is basic functionality, Octave, SAGE, scipy, sympy, and maxima could suit you. For more advanced tasks, be prepared to pay up.
PS: This post will change as I learn more.
Support for my opinion.
Support for my opinion.
I've been using Maple and numpy for a few years. I stopped upgrading my Maple at version 13 because improvements didn't compel me to pay for a newer version for my use. I chose Maple as a happy medium between the numerical and matrix-focused MATLAB and the symbolically-focused Mathematica. They can all do symbolics and matrix operations but they differ in how elegantly they're performed. But, as I gave Maple more complex mathematical tasks such as the calculus involved with Fourier transforms, and symbolic tensor & matrix operations, it became clear that it wasn't up to the task. I knew Mathematica was superior for those tasks but I chugged along with Maple until I got to George Mason University where they had a site license for Mathematica.
Right off the bat I could see a clear superiority in the user experience (mind that I didn't use Maple beyond ver 13). Vectors and matrices are just lists. In Maple there are objects called vectors, lists, tables, and matrices and you sometimes had to import them from a module. Clearly, in just this area, Mathematica is superior. Mathematica includes alot of functionality out of the box. In contrast, in Maple, I had to import modules for what I considered basic functionality which makes the experience not as consistent.
In comparison to other math environments:
- SAGE: SAGE uses a web interface which can't come close to a desktop application's functionality. The UI does not come close to Mathematica. Also, the open-source symbolic packages it uses are primitive compared to Maple let alone Mathematica. Oh and good luck installing it on Windows.
- numpy/scipy: Numpy and scipy give mathematical functionality as opposed to being a math environment; basic ones at that. But, its indexing coolness does exist in Mathematica.
- Mathematica has pattern-matching..others don't enough said.
For math-centric programming, Mathematica is a high bar to get to in terms of consistency, documentation, functionality, and general experience. My general gripe about open-source is the lack of an umbrella vision that moves a project in a certain direction. SAGE is the best (the only) hope but it has a long way to go. Even in the best scenario, getting different python packages seamlessly working together is doubtful (eg. look at my sym2num function).
Conclusion: If all you need is basic functionality, Octave, SAGE, scipy, sympy, and maxima could suit you. For more advanced tasks, be prepared to pay up.
PS: This post will change as I learn more.
Support for my opinion.
Support for my opinion.
Labels:
cas,
maple,
mathematica,
numpy,
python,
sage,
scientific computing,
scipy
Tuesday, November 6, 2012
Making My Near-Field Radiation Calculator Application for nanohub.org
I wanted to share a bit of my experience developing my python application for nanohub.org. For me, there were two main issues: One is configuring the nanohub computing environment to resemble my development environment (and you'll need to figure out the details of installing your tool by looking at other tools and some docs that can help). The other is using Rappture to implement the functionality I want for my application.
To set up my development environment in nanohub, I first delved into how I could make a self-contained python installation. This had me looking into pythonbrew and virtualenv with pythonbrew being more useful. But the admins at nanohub were willing to install the packages that I needed so I didn't have to mess with that too much. Nonetheless, these two projects are great for creating portable python projects.
After I took care of the development environment, I went ahead and started learning about Rappture. I needed some flexibility and what I call 'dynamic input'. Now I could have made a simple application but then it wouldn't be as useful. I had to bend the API to suit my needs. I ended up merging what should have been two simple applications into one more complicated application. That is: I had two sets of inputs for the same calculation. I'll list the issues I had with Rappture:
Here's how it works:
First you get a page where you can enter named electric permittivity functions.
Then you input the simulation parameters and options.
Then you get various outputs. I'm showing the cool ones here:
The above graph is an integration over the y-axis of the following function in two variables.
To set up my development environment in nanohub, I first delved into how I could make a self-contained python installation. This had me looking into pythonbrew and virtualenv with pythonbrew being more useful. But the admins at nanohub were willing to install the packages that I needed so I didn't have to mess with that too much. Nonetheless, these two projects are great for creating portable python projects.
After I took care of the development environment, I went ahead and started learning about Rappture. I needed some flexibility and what I call 'dynamic input'. Now I could have made a simple application but then it wouldn't be as useful. I had to bend the API to suit my needs. I ended up merging what should have been two simple applications into one more complicated application. That is: I had two sets of inputs for the same calculation. I'll list the issues I had with Rappture:
- Lacking documentation especially for the Python API. This is compensated for by looking at the API for the other languages and the many examples. I mainly figured out how Rappture works by example.
- Lack of computation status output as addressed by this page.
- Not all GUI elements are in the GUI GUI builder.
- Could not generate an authentic contour plot. I used the elements field and cloud to generate a contour plot. This probably wasn't the intended use of the of the elements but it worked...horribly. It struggled to display just a 50x50 grid calculation not to mention that I couldn't put axis labels. Also of general note, elements take strings to input numbers which is inefficient (but works). In my case, I wanted to be able to display at least 1000x1000 points. My solution was to ditch this GUI element and just use an image generated by matplotlib's contourf.
- 'Static' input. I wanted a user of my program to be able to change and check inputs in certain ways.
- I wanted the user to be able to add a selection to a drop-down list. Could not be done. Instead, I have the user reviewing a table of named text input so that the name can go into a text box which should have been a selection.
- Have an input value checked based on another input value. Can be done functionally by writing code that gets processed after hitting the 'Simulate' button. I don't know how I could hook into the GUI input elements so that they can be checked before the inputs are processed for simulation. For example, in my program I wanted to check that the value of 1.0 wasn't between two input values.
- Limited selection of units. (eg. no rad/s.) There should be functionality so that any unit can be input and can be converted to a compatible unit.
Here's how it works:
First you get a page where you can enter named electric permittivity functions.
Then you input the simulation parameters and options.
Then you get various outputs. I'm showing the cool ones here:
![]() |
| The characteristic nearly monochromatic near-field radiation between SiC and SiC |
Friday, July 6, 2012
Task 6: Basline Near-Field Calculation
I'm going to be doing FDTD stuff but need a baseline transport calculation using a closed-form expression. Maple, Maxima, Sympy couldn't do the integration. So I had to do it numerically in a more manual way. So I used my trusty Python environment with the following modules that did most of the work:
- sympy: I used this package to build the final complicated expression that gives the heat transfer. Unfortuately, it's immature software. For example it couldn't solve: sympy.solve((x**2)**0.5-9). Also the lambda features were confusing since there were two lambda features in the package.
- numexpr: 1.4 I used this blazingly fast expression evaluation package to evaluate the built expression.(I had to comment line 263 in necompiler.py because I was working with imaginary numbers. It's an optimization step that needs to order numbers and there is no order to imaginary numbers) See better workaround.
11/8: It's published on nanohub.org.
The code for the core of the program is below.
- sympy: I used this package to build the final complicated expression that gives the heat transfer. Unfortuately, it's immature software. For example it couldn't solve: sympy.solve((x**2)**0.5-9). Also the lambda features were confusing since there were two lambda features in the package.
- numexpr: 1.4 I used this blazingly fast expression evaluation package to evaluate the built expression.
11/8: It's published on nanohub.org.
| Prototype GUI built using hubzero's Rappture GUI builder |
#Author: Majid S. al-Dosari
#http://www.thermalfluidscentral.org/encyclopedia/index.php
#/Near-field_radiative_transfer_between_two_semi-infinite_media
#Calculate Bulk 1-D Near-Field Radiation between two Bulks.
#Input:
#1. permittivity of both bulks
#2. distance b/w them
#3. Temperature of each bulk T1,T2
#Output:
#Power as a function of frequency or just total power
#physics notes
#the integrand is split into TE and TM components
#these components are further split into propagating (omega/c<1 data-blogger-escaped-and="and" data-blogger-escaped-c="c" data-blogger-escaped-evanescent="evanescent" data-blogger-escaped-omega="omega" data-blogger-escaped-waves="waves">1)
#computation procedure
#1. build expression for TE(S-polarization) and TM(P-polarization) waves.
#"s" in the web reference.
#See 'sept' and 'sest' names below as examples
#2. convert sympy expression to numexpr
#3. use evalq for total power, evalqw for power as a function of frequency,
#evalqwsections to get power as a function of frequency when the practical
#upper limit of integration in wavevector is unknown (infinity)
#they share the common argument sequence
#(stringformofintegrand,beta1,beta2,T1,T2,nb=1000,omega1=0,no=1000)
#beta here is normalized to omega/c. omega2 is determined as that frequency at
#which 99.99...?% blackbody radiation integral is covered at the larger
#of T1, and T2.
#nb and no are the number of points discretized on beta and omega respectively
#don't include beta=1 in the range b/w beta1 and beta2.
#use beta2=.9999 as an upperlimit for integrating propagating waves and
#use beta1=1.0001 as a lower limit for integrating evanescent waves.
#for evalq and evalqwsections, if beta2 is in the evanescent range (>1), it will
#be taken as an integration 'chunck' size and will continue integration until
#asymptotes. be careful, because if you know TM waves's contribution
#are negligible it may continue to integrate to ridiculous wavevectors and
#therefore give unrealistic results. there is an absolute limit for beta2
# around a few interatomic distances
#programdata is a dictionary that holds frequencies and wavevectors that the
#integrand was evaluated over. i use them for plot axes.
#example: evalq(sym2num(str(sept)),1.001,50,300,299,no=3e3,nb=3e2)
#use the output from evals to make a pretty picture of the evaluated func
#eg use matplotlib's contourf(swapaxis(evals output))
#coding note:
#a=1+2j
#numexpr.evaluate('(a*1j)+0')
#gives TypeError: no ordering relation is defined for complex numbers
#commented out line 263 #ordered_constants.sort() in necompiler
#see 'HACK' in sym2num
#evaluate new option to sub output inplace of input
import numpy
import sympy
import numexpr
#might be able to get away w/ 100sx100s grid
#.0001 and .9999 , 1.00001
#Material Permittivity Input
#---
def invcm2rad(ic):return ic*2*numpy.pi*2.998e10
consts={'c':2.998e8,'hbar':1.054571596e-34,'kb':1.380650277e-23
,'omega_LO':invcm2rad(969),'omega_TO':invcm2rad(793),'Gamma':invcm2rad(4.76)
,'eps_inf':6.7}
eps_inf,omega_LO ,omega_TO ,Gamma,omega=\
sympy.symbols('eps_inf omega_LO omega_TO Gamma omega')
eps_SiC=eps_inf*(1+(omega_LO**2-omega_TO**2)\
/(omega_TO**2-1j*Gamma*omega-omega**2))
#eps_Au=1-(13.71e15**2)/(omega*(omega+2*numpy.pi*1j*4.05e13))
#indexed epsilon
eps=[1,eps_SiC,eps_SiC]
#eps=[1,eps1,eps2]
beta,c=sympy.symbols('beta c')
beta=beta*(omega/c) #in terms of omega/c
def gama(j):return (eps[j]*((omega/c)**2)-beta**2)**.5
#def gama(j):return (eps[j]-(beta/(omega/c))**2 )**.5 #in terms of omega/c
#0j to fix ValueError: negative number cannot be raised to a fractional power
#not needed with sympy i think
def r0_s(j):return (gama(0)-gama(j))*(gama(0)+gama(j))
def r0_p(j):return (eps[j]*gama(0)-gama(j))/(eps[j]*gama(0)+gama(j))
r0=dict({'s':r0_s,'p':r0_p})
d=sympy.Symbol('d')
def sevan(sorp):
return sympy.im(r0[sorp](1))*sympy.im(r0[sorp](2)) \
*beta*sympy.exp(-2*sympy.im(gama(0))*d)/ \
sympy.abs(1-r0[sorp](1)*r0[sorp](2)*sympy.exp(-2*sympy.im(gama(0))*d))**2
#the sympy on nanohub doesn't have sympy.Abs rather it has sympy.abs
def sprop(sorp):
return (1.0/4)*beta*(1-sympy.abs(r0[sorp](1))**2)*(1-sympy.abs(r0[sorp](2))**2)\
/sympy.abs(1-r0[sorp](1)*r0[sorp](2)*sympy.exp((2j)*gama(0)*d))**2
#the s funcs /give/ the expr. they aren't funcs themselves
hbar,T,kb=sympy.symbols('hbar T kb')
Thetaexpr=hbar*omega/(sympy.exp(hbar*omega/(kb*T))-1)
Thetaexpr=Thetaexpr.subs(consts)
def Theta(omega,T):
if T==0: return 0
return Thetaexpr.subs({'T':T,'omega':omega})
def q_omega(omega,T1,T2,intofs):
return ((1./sympy.pi)**2)*(Theta(omega,T1)-Theta(omega,T2))*intofs
#import scipy
#from scipy import integrate
#useless
def intofs(sfunc,beta1,beta2):
#sfunc1=lambda b: sfunc(b+0j)#needed for numexpr
#return scipy.integrate.quad(sfunc1,beta1,beta2)[0] #slowwwww
#i made this func bc i could use scipy quad or something else
return sympy.mpmath.quad(sfunc,[beta1,beta2])
#compress const array
#evaluated numerically from the start in a simple way
def nintofs(numstr,beta1,beta2,omega,num=100):#sfunc w/o omega :(
#when i subs in an omega i get weird results
[beta,omega],db,do=geninputarray(beta1,beta2,num,omega,1234,1)
return db*numpy.sum(numexpr.evaluate(numstr))
def evals_(numstr,*args,**kwargs):#sfunc w/o omega :(
#when i subs in an omega i get weird results
[beta,omega],db,do=geninputarray(*args,**kwargs)
return (numexpr.evaluate(numstr)),db,do #with zeros you get NaNs
#swap(reshape())
def evals(sstr,*args,**kwargs):
so,db,do=evals_(sstr,*args,**kwargs)
so=numpy.reshape(so,(args[2],args[5]))
#swap args for reshape other way for contourf, then swapaxis
return so,db,do
def evalDTs(DTstr,omegas):#sfunc w/o omega :(
#when i subs in an omega i get weird results
omega=omegas
if omega[0]<.1:omega[0]=.1 #get's rid of NaNs
return numexpr.evaluate(DTstr)
#the limits of integration
planckexpr=(hbar*omega**3/(4*sympy.pi**3*c**2))\
/(sympy.exp(hbar*omega/(kb*T))-1)
planckexpr=planckexpr.subs(consts)
def planckd(o,T):return planckexpr.subs({'omega':o,'T':T})
#dplanckexpr=planckexpr.diff(omega)
#from scipy import optimize
def omegalim(T): return consts['c']*2*numpy.pi/(1000e-6/T) #-.000321+1 of BB
#% this might have an effect
def solvebeta(omega): return omega/consts['c']
#return sympy.solve(gama(0).subs(consts).subs({'omega':omega}))[0]
#it can't solve this!
def betalim(T): return omegalim(T)/consts['c']
def evalqwint(sfunc,beta1,beta2,T1,T2,omega1=1e-20,no=500):
os=numpy.linspace(omega1,omegalim(max(T1,T2)),no)
sfb=lambda o:sfunc.subs({'omega':o})#gives func of sympy
spec=[intofs( lambda b:sfb(o).evalf(subs={'beta':b}) ,beta1,beta2) for o in os]
#spec=[nintofs( sym2num(str(sfunc)) ,beta1,beta2,o,num=1e3) for o in os]
#np.random.seed(123)
#DTs=np.random.rand(len(os))
#this is introducing a good amoutn of error in SiC for d=100e-9
DTs=(evalDTs(sym2num(str((Theta(omega,T1)-Theta(omega,T2)))) ,os))
qw= spec*DTs*(os/consts['c'])/numpy.pi**2# no 2pi ??
return qw,os[1]-os[0] #dw
def evalqw(sstr,beta1,beta2,T1,T2,nb=1000,omega1=0,no=1000):
#evals(sym2num(str(sppt+spst)),0,.999999,1000,0,omegalim(300),1000)
so=evals(sstr,beta1,beta2,nb,omega1,omegalim(max(T1,T2)),no)
#so=evals(sstr,beta1,beta2,nb,1.7e14,1.85e14,no)
os=numpy.real(programdata['os'])
dBetas=so[1]*(os/consts['c'])
#np.random.seed(123)
#DTs=np.random.rand(len(os))
#this is introducing a good amoutn of error in SiC for d=100e-9
DTs=(evalDTs(sym2num(str((Theta(omega,T1)-Theta(omega,T2)))) ,os))
qw= numpy.sum(numpy.real(so[0]),axis=0)*DTs*dBetas/numpy.pi**2# no /2pi ??
return qw,so[2] #dw
import scipy
from scipy import integrate
def evalq(*args,**kwargs):
if args[2]<=1: o=evalqw(*args,**kwargs) #for propagating waves
else: o=evalqwsections(*args,**kwargs)
#return sum(o[0])*o[1]
return scipy.integrate.trapz(o[0],dx=o[1])
def evalqwsections(*args,**kwargs):
#take beta2 as sectioning
#if beta2>1:
args=list(args)
interval=args[2]-args[1]
ac=asymptote()
passn=1
xlim=(numpy.pi/.5e-9)/(omegalim(max(args[3],args[4]))/consts['c'])#~=7e3
#(2*3.14/10e-10)/ #per zhang review of NF thermal rad
#say 10angstrom is local limit
while 1:
if args[1]>xlim:break
o=evalqw(*args,**kwargs)
if passn==1:cur=numpy.zeros(len(programdata['os']))
cur+=o[0]
sm=numpy.sum(o[0])*o[1]
if ac.didit(sm)==True:break
passn+=1
#shift for next interval
args[1]+=interval;args[2]+=interval;
return cur, o[1]
class asymptote():
#expectation of the trend is that it goes up then goes down in the end
#it could vary up and down in b/w
def __init__(self):
self.nums=[]
self.curmax=float("-inf")
def didit(self,num):
tol=.01
if len(self.nums)==0:self.curmax=num
self.nums.append(num)
if self.up()==True:
self.curmax=num
return False
if self.up()==None: return None
if self.up()==False and self.curmax*tol>num:return True
else: return False
def up(self):
if len(self.nums)<2:return data-blogger-escaped-if="if" data-blogger-escaped-none="none" data-blogger-escaped-self.nums="self.nums">0:return True
else: return False
onejay=1j
def sym2num(astr):
"""
to go from a sympy expr to a numexpr expr
"""
astr=astr.replace('I','1j')
astr=astr.replace('1j','onejay') #A HACK!!!
astr=astr.replace('im','imag')
astr=astr.replace('re','real')
astr=astr.replace('Abs','abs')
return astr
#a hold for intermediate data
programdata={}
def geninputarray(betamin,betamax,bnum,omegamin,omegamax,onum):#,num=None):
#onum=100,bnum=100,num=None
# ,betamin=1e-20,betamax=2e6,omegamin=1e-20,omegamax=2e14):
"""inputs in numexpr
extreme:say you want just one line instead of 2D:
specify a close bmin and bmax with bnum=1
betamin will always be evaluated. to evaluate at a certain beta put it
in betamin
"""
# if num!=None:#if num specifided
# bnum=onum=num
#bs=numpy.add(numpy.linspace(betamin,betamax,num=num),0j)
bs=numpy.zeros(bnum,dtype='c8') #could use bigger float but i dont care
bs+=numpy.linspace((betamin),(betamax),num=bnum,endpoint=False)
#add 0j needed to work in expr
os=numpy.zeros(onum,dtype='c8')
os+=numpy.linspace((omegamin),(omegamax),num=onum,endpoint=False)
if os[0]==0:os[0]=1e-20
if bs[0]==0:bs[0]=1e-20
programdata.update({'os':os,'bs':bs})
return (cartesian([bs,os]).transpose()
,(float(betamax)-betamin)/len(bs),(float(omegamax)-omegamin)/len(os))
#fxy eval, dx*dy
#evanescent waves test expressions
sep=sevan('p').subs(consts)
sept=sep.subs({'d':100e-9})
ses= sevan('s').subs(consts)
sest=ses.subs({'d':100e-9})
#propagatin waves test expressions
spp=sprop('p').subs(consts)
sppt= spp.subs({'d':1})
sps=sprop('s').subs(consts)
spst=sps.subs({'d':1})
import numpy as np
#WOW!
def cartesian(arrays, out=None):
#http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays
"""
Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m,1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m,1:] = out[0:m,1:]
return out
Subscribe to:
Posts (Atom)




