-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide functions for reference data #312
Conversation
WalkthroughThis update significantly enhances the capability to retrieve detailed chemical information from sources like Mendeleev, Wikipedia, and Wolfram Alpha. New modules streamline access to elemental properties, promoting better data accessibility for research and education. Additionally, the environment configurations have been optimized, with various dependencies added and others removed, ensuring a more streamlined and effective functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Mendeleev
participant Wikipedia
participant Wolfram
User->>Mendeleev: Request chemical information
Mendeleev-->>User: Return detailed properties
User->>Wikipedia: Request elastic properties
Wikipedia-->>User: Return elastic property values
User->>Wolfram: Request chemical data
Wolfram-->>User: Return comprehensive chemical properties
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- atomistics/referencedata/mendeleevdb.py (1 hunks)
- atomistics/referencedata/wiki.py (1 hunks)
- atomistics/referencedata/wolfram.py (1 hunks)
Additional context used
Ruff
atomistics/referencedata/wolfram.py
83-83: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
106-109: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
Additional comments not posted (1)
atomistics/referencedata/wiki.py (1)
21-23
: Handle potential URL changes.The Wikipedia URL might change, breaking the function. Consider adding a fallback mechanism or updating the URL dynamically.
Is the URL expected to remain stable? If not, consider adding a fallback mechanism or a way to update the URL dynamically.
atomistics/referencedata/wiki.py
Outdated
for i, p in enumerate(property_lst): | ||
df_tmp = df_lst[i] | ||
property_dict[p] = float( | ||
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing chemical symbols.
The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.
- property_dict[p] = float(
- df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
- )
+ try:
+ property_dict[p] = float(
+ df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
+ )
+ except KeyError:
+ property_dict[p] = None
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for i, p in enumerate(property_lst): | |
df_tmp = df_lst[i] | |
property_dict[p] = float( | |
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"] | |
) | |
for i, p in enumerate(property_lst): | |
df_tmp = df_lst[i] | |
try: | |
property_dict[p] = float( | |
df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"] | |
) | |
except KeyError: | |
property_dict[p] = None |
atomistics/referencedata/wiki.py
Outdated
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict: | ||
""" | ||
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | ||
|
||
Args: | ||
chemical_symbol (str): Chemical symbol of the element. | ||
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | ||
|
||
Returns: | ||
str: Property value (various types): Value of the property for the given element, if available. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve the function documentation.
The return type in the docstring should be dict
instead of str
to match the actual return type.
- str: Property value (various types): Value of the property for the given element, if available.
+ dict: Dictionary containing the property values for the given element, if available.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict: | |
""" | |
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | |
Args: | |
chemical_symbol (str): Chemical symbol of the element. | |
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | |
Returns: | |
str: Property value (various types): Value of the property for the given element, if available. | |
""" | |
def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict: | |
""" | |
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | |
Args: | |
chemical_symbol (str): Chemical symbol of the element. | |
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | |
Returns: | |
dict: Dictionary containing the property values for the given element, if available. | |
""" |
df = fetch_table("elements") | ||
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider caching the fetched data.
Fetching the table every time the function is called might be inefficient. Consider caching the data to improve performance.
from functools import lru_cache
@lru_cache(maxsize=1)
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
...
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict: | ||
""" | ||
Get information of a given chemical element | ||
|
||
Args: | ||
chemical_symbol: Chemical Element like Au for Gold | ||
|
||
Returns: | ||
dict: Dictionary with the following keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing chemical symbols.
The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.
- return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
+ try:
+ return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
+ except KeyError:
+ return {}
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict: | |
""" | |
Get information of a given chemical element | |
Args: | |
chemical_symbol: Chemical Element like Au for Gold | |
Returns: | |
dict: Dictionary with the following keys | |
def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict: | |
""" | |
Get information of a given chemical element | |
Args: | |
chemical_symbol: Chemical Element like Au for Gold | |
Returns: | |
dict: Dictionary with the following keys | |
""" | |
try: | |
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() | |
except KeyError: | |
return {} |
atomistics/referencedata/wolfram.py
Outdated
result = pandas.concat( | ||
[ | ||
_collect( | ||
url=v["url"], | ||
column=k, | ||
select_function=v["select_function"], | ||
current_filter=v["current_filter"], | ||
).set_index("element") | ||
for k, v in tqdm(data_dict.items()) | ||
], | ||
axis=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add logging for download progress.
The function might benefit from logging the download progress to provide feedback to the user.
import logging
logging.basicConfig(level=logging.INFO)
def _wolframalpha_download():
logging.info("Starting download from Wolfram Alpha...")
result = pandas.concat(
[
_collect(
url=v["url"],
column=k,
select_function=v["select_function"],
current_filter=v["current_filter"],
).set_index("element")
for k, v in tqdm(data_dict.items())
],
axis=1,
sort=False,
)
result["volume"] = result.apply(
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
)
data_path = os.path.join(os.path.dirname(__file__), "..", "data")
os.makedirs(data_path, exist_ok=True)
result.to_csv(os.path.join(data_path, "wolfram.csv"))
logging.info("Download completed.")
atomistics/referencedata/wolfram.py
Outdated
filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv") | ||
if not os.path.exists(filename): | ||
_wolframalpha_download() | ||
df = pandas.read_csv(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider caching the fetched data.
Fetching the data every time the function is called might be inefficient. Consider caching the data to improve performance.
from functools import lru_cache
@lru_cache(maxsize=1)
def get_chemical_information_from_wolframalpha(chemical_element):
...
atomistics/referencedata/wolfram.py
Outdated
def _wolframalpha_download(): | ||
result = pandas.concat( | ||
[ | ||
_collect( | ||
url=v["url"], | ||
column=k, | ||
select_function=v["select_function"], | ||
current_filter=v["current_filter"], | ||
).set_index("element") | ||
for k, v in tqdm(data_dict.items()) | ||
], | ||
axis=1, | ||
sort=False, | ||
) | ||
result["volume"] = result.apply( | ||
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | ||
) | ||
data_path = os.path.join(os.path.dirname(__file__), "..", "data") | ||
os.makedirs(data_path, exist_ok=True) | ||
result.to_csv(os.path.join(data_path, "wolfram.csv")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for network-related errors.
The function does not handle network-related errors, which might cause it to fail silently.
- result = pandas.concat(
+ try:
+ result = pandas.concat(
- result.to_csv(os.path.join(data_path, "wolfram.csv"))
+ result.to_csv(os.path.join(data_path, "wolfram.csv"))
+ except requests.RequestException as e:
+ print(f"Network error occurred: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _wolframalpha_download(): | |
result = pandas.concat( | |
[ | |
_collect( | |
url=v["url"], | |
column=k, | |
select_function=v["select_function"], | |
current_filter=v["current_filter"], | |
).set_index("element") | |
for k, v in tqdm(data_dict.items()) | |
], | |
axis=1, | |
sort=False, | |
) | |
result["volume"] = result.apply( | |
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | |
) | |
data_path = os.path.join(os.path.dirname(__file__), "..", "data") | |
os.makedirs(data_path, exist_ok=True) | |
result.to_csv(os.path.join(data_path, "wolfram.csv")) | |
def _wolframalpha_download(): | |
try: | |
result = pandas.concat( | |
[ | |
_collect( | |
url=v["url"], | |
column=k, | |
select_function=v["select_function"], | |
current_filter=v["current_filter"], | |
).set_index("element") | |
for k, v in tqdm(data_dict.items()) | |
], | |
axis=1, | |
sort=False, | |
) | |
result["volume"] = result.apply( | |
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | |
) | |
data_path = os.path.join(os.path.dirname(__file__), "..", "data") | |
os.makedirs(data_path, exist_ok=True) | |
result.to_csv(os.path.join(data_path, "wolfram.csv")) | |
except requests.RequestException as e: | |
print(f"Network error occurred: {e}") |
atomistics/referencedata/wolfram.py
Outdated
def get_chemical_information_from_wolframalpha(chemical_element): | ||
""" | ||
Get information of a given chemical element | ||
Args: | ||
chemical_element: Chemical Element like Au for Gold | ||
Returns: | ||
dict: Dictionary with the following keys | ||
element: chemical element | ||
thermalcondictivity: thermal conductivity | ||
atomicradius: calculated distance from nucleus of outermost electron | ||
bulkmodulus: bulk modulus (incompressibility) | ||
shearmodulus: shear modulus of solid | ||
youngmodulus: Young's modulus of solid | ||
poissonratio: Poisson ratio of solid | ||
density: density at standard temperature and pressure | ||
liquiddensity: liquid density at melting point | ||
thermalexpansion: linear thermal expansion coefficient | ||
meltingpoint: melting temperature in kelvin | ||
vaporizationheat: latent heat for liquid-gas transition | ||
specificheat: specific heat capacity | ||
latticeconstant: crystal lattice constants | ||
crystal: basic crystal lattice structure | ||
volmolar: molar volume | ||
mass: average atomic weight in atomic mass units | ||
volume: Volume | ||
""" | ||
filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv") | ||
if not os.path.exists(filename): | ||
_wolframalpha_download() | ||
df = pandas.read_csv(filename) | ||
return df[df.element == chemical_element].squeeze(axis=0).to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing chemical elements.
The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.
- return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+ try:
+ return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+ except KeyError:
+ return {}
Committable suggestion was skipped due to low confidence.
for more information, see https://pre-commit.ci
…o reference_data # Conflicts: # atomistics/referencedata/__init__.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Outside diff range, codebase verification and nitpick comments (4)
tests/test_referencedata.py (3)
29-31
: Consider adding more assertions.The test currently checks only the atomic number. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["atomic_number"], 13) self.assertEqual(al_data["symbol"], "Al") self.assertEqual(al_data["name"], "Aluminum")
37-39
: Consider adding more assertions.The test currently checks only the lattice constant. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["latticeconstant"], "(404.95, 404.95, 404.95)") self.assertEqual(al_data["atomic_number"], 13) self.assertEqual(al_data["symbol"], "Al") self.assertEqual(al_data["name"], "Aluminum")
44-53
: Consider adding more assertions.The test currently checks only the elastic properties. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["bulk_modulus"], 76.0) self.assertEqual(al_data["poissons_ratio"], 0.35) self.assertEqual(al_data["shear_modulus"], 26.0) self.assertEqual(al_data["youngs_modulus"], 70.0) self.assertEqual(al_data["symbol"], "Al") self.assertEqual(al_data["name"], "Aluminum")pyproject.toml (1)
59-64
: LGTM! Consider adding a comment explaining the purpose of theexperimental
section.The addition of the
experimental
section is a good way to manage dependencies that are not yet fully integrated into the main project workflow. Adding a brief comment can help other developers understand its purpose.[project.optional-dependencies] # Dependencies that are still experimental and not fully integrated experimental = [ "lxml==5.2.2", "mendeleev==0.17.0", "pandas==2.2.2", "requests==2.32.3", ]
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- .ci_support/environment-old.yml (1 hunks)
- .ci_support/environment.yml (1 hunks)
- atomistics/referencedata/init.py (1 hunks)
- atomistics/referencedata/wolfram.py (1 hunks)
- pyproject.toml (1 hunks)
- tests/test_referencedata.py (1 hunks)
Additional context used
Ruff
tests/test_referencedata.py
17-17:
lxml
imported but unused; consider usingimportlib.util.find_spec
to test for availability(F401)
atomistics/referencedata/wolfram.py
82-82: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
105-108: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
Additional comments not posted (20)
.ci_support/environment.yml (6)
6-6
: Dependency Addition: lxmlThe addition of
lxml =5.2.2
looks good. Ensure that this version is compatible with other dependencies in the environment.
7-7
: Dependency Addition: mendeleevThe addition of
mendeleev =0.17.0
looks good. Ensure that this version is compatible with other dependencies in the environment.
9-9
: Dependency Addition: pandasThe addition of
pandas =2.2.2
looks good. Ensure that this version is compatible with other dependencies in the environment.
10-10
: Dependency Addition: phonopyThe addition of
phonopy =2.26.6
looks good. Ensure that this version is compatible with other dependencies in the environment.
11-11
: Dependency Addition: requestsThe addition of
requests =2.32.3
looks good. Ensure that this version is compatible with other dependencies in the environment.
13-13
: Dependency Addition: seekpathThe addition of
seekpath =2.1.0
looks good. Ensure that this version is compatible with other dependencies in the environment..ci_support/environment-old.yml (7)
5-5
: Dependency Addition: dynaphopyThe addition of
dynaphopy =1.17.5
looks good. Ensure that this version is compatible with other dependencies in the environment.
7-7
: Dependency Addition: iprpy-dataThe addition of
iprpy-data =2023.07.25
looks good. Ensure that this version is compatible with other dependencies in the environment.
8-8
: Dependency Addition: jinja2The addition of
jinja2 =2.11.3
looks good. Ensure that this version is compatible with other dependencies in the environment.
9-9
: Dependency Addition: lammpsThe addition of
lammps =2022.06.23
looks good. Ensure that this version is compatible with other dependencies in the environment.
10-10
: Dependency Addition: lxmlThe addition of
lxml =4.9.1
looks good. Ensure that this version is compatible with other dependencies in the environment.
11-11
: Dependency Addition: mendeleevThe addition of
mendeleev =0.10.0
looks good. Ensure that this version is compatible with other dependencies in the environment.
16-16
: Dependency Addition: requestsThe addition of
requests =2.24.0
looks good. Ensure that this version is compatible with other dependencies in the environment.atomistics/referencedata/__init__.py (3)
1-1
: Import Statement: get_experimental_elastic_property_wikipediaThe import of
get_experimental_elastic_property_wikipedia
fromatomistics.referencedata.wiki
looks good. Ensure that this function exists in the specified module.
3-14
: Import Statements and Error Handling: get_chemical_information_from_mendeleev and get_chemical_information_from_wolframalphaThe import of
get_chemical_information_from_mendeleev
andget_chemical_information_from_wolframalpha
fromatomistics.referencedata.mendeleevdb
looks good. The try-except block effectively handles potential import errors. Ensure that these functions exist in the specified module.
9-17
: all DeclarationThe
__all__
declaration correctly lists the imported functions. This ensures that only the specified functions are accessible when the module is imported.atomistics/referencedata/wolfram.py (4)
17-21
: LGTM!The function correctly handles the conversion of density values from g/l to kg/l.
24-28
: LGTM!The function correctly handles the conversion of split string values to float.
31-36
: LGTM!The function correctly handles the conversion of lattice constant values from a split string.
39-43
: LGTM!The function correctly handles the conversion of scientific notation strings to float values.
atomistics/referencedata/wolfram.py
Outdated
def _wolframalpha_download(): | ||
result = pandas.concat( | ||
[ | ||
_collect( | ||
url=v["url"], | ||
column=k, | ||
select_function=v["select_function"], | ||
current_filter=v["current_filter"], | ||
).set_index("element") | ||
for k, v in data_dict.items() | ||
], | ||
axis=1, | ||
sort=False, | ||
) | ||
result["volume"] = result.apply( | ||
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | ||
) | ||
data_path = os.path.join(os.path.dirname(__file__), "data") | ||
os.makedirs(data_path, exist_ok=True) | ||
result.to_csv(os.path.join(data_path, "wolfram.csv")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for network-related errors.
The function does not handle network-related errors, which might cause it to fail silently.
- result = pandas.concat(
+ try:
+ result = pandas.concat(
- result.to_csv(os.path.join(data_path, "wolfram.csv"))
+ result.to_csv(os.path.join(data_path, "wolfram.csv"))
+ except requests.RequestException as e:
+ print(f"Network error occurred: {e}")
Committable suggestion was skipped due to low confidence.
atomistics/referencedata/wolfram.py
Outdated
def _extract_entry(df, n): | ||
for i in range(int(len(df.columns) / 2)): | ||
if n in df[i].values: | ||
v = df[i + 1].values[df[i].values.tolist().index(n)] | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing values.
The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.
- v = df[i + 1].values[df[i].values.tolist().index(n)]
+ try:
+ v = df[i + 1].values[df[i].values.tolist().index(n)]
+ except ValueError:
+ print(f"Value {n} not found in DataFrame.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _extract_entry(df, n): | |
for i in range(int(len(df.columns) / 2)): | |
if n in df[i].values: | |
v = df[i + 1].values[df[i].values.tolist().index(n)] | |
return v | |
def _extract_entry(df, n): | |
for i in range(int(len(df.columns) / 2)): | |
if n in df[i].values: | |
try: | |
v = df[i + 1].values[df[i].values.tolist().index(n)] | |
except ValueError: | |
print(f"Value {n} not found in DataFrame.") | |
return v |
atomistics/referencedata/wolfram.py
Outdated
def get_chemical_information_from_wolframalpha(chemical_element): | ||
""" | ||
Get information of a given chemical element | ||
Args: | ||
chemical_element: Chemical Element like Au for Gold | ||
Returns: | ||
dict: Dictionary with the following keys | ||
element: chemical element | ||
thermalcondictivity: thermal conductivity | ||
atomicradius: calculated distance from nucleus of outermost electron | ||
bulkmodulus: bulk modulus (incompressibility) | ||
shearmodulus: shear modulus of solid | ||
youngmodulus: Young's modulus of solid | ||
poissonratio: Poisson ratio of solid | ||
density: density at standard temperature and pressure | ||
liquiddensity: liquid density at melting point | ||
thermalexpansion: linear thermal expansion coefficient | ||
meltingpoint: melting temperature in kelvin | ||
vaporizationheat: latent heat for liquid-gas transition | ||
specificheat: specific heat capacity | ||
latticeconstant: crystal lattice constants | ||
crystal: basic crystal lattice structure | ||
volmolar: molar volume | ||
mass: average atomic weight in atomic mass units | ||
volume: Volume | ||
""" | ||
filename = os.path.join(os.path.dirname(__file__), "data", "wolfram.csv") | ||
if not os.path.exists(filename): | ||
_wolframalpha_download() | ||
df = pandas.read_csv(filename) | ||
return df[df.element == chemical_element].squeeze(axis=0).to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing chemical elements.
The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.
- return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+ try:
+ return df[df.element == chemical_element].squeeze(axis=0).to_dict()
+ except KeyError:
+ return {}
Committable suggestion was skipped due to low confidence.
atomistics/referencedata/wolfram.py
Outdated
def _get_content_from_url(url): | ||
content = pandas.read_html(requests.get(url).text) | ||
if len(content[8]) > 1: | ||
return content[8] | ||
else: | ||
return content[9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for network-related errors.
The function does not handle network-related errors, which might cause it to fail silently.
- content = pandas.read_html(requests.get(url).text)
+ try:
+ response = requests.get(url)
+ response.raise_for_status()
+ content = pandas.read_html(response.text)
+ except requests.RequestException as e:
+ print(f"Network error occurred: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_content_from_url(url): | |
content = pandas.read_html(requests.get(url).text) | |
if len(content[8]) > 1: | |
return content[8] | |
else: | |
return content[9] | |
def _get_content_from_url(url): | |
try: | |
response = requests.get(url) | |
response.raise_for_status() | |
content = pandas.read_html(response.text) | |
except requests.RequestException as e: | |
print(f"Network error occurred: {e}") | |
return None | |
if len(content[8]) > 1: | |
return content[8] | |
else: | |
return content[9] |
atomistics/referencedata/wolfram.py
Outdated
def _extract_lst(df, column, select_function, current_filter): | ||
element_lst, property_lst = [], [] | ||
try: | ||
ptable = fetch_table("elements") | ||
for n, el in zip(ptable.name.values, ptable.symbol.values): | ||
v = _extract_entry(df=df, n=n) | ||
if current_filter(v): | ||
element_lst.append(el) | ||
property_lst.append(select_function(v)) | ||
except ValueError: | ||
raise ValueError(column, el, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling by providing more context.
The function raises a ValueError
without providing context about the error. Use raise ... from None
to distinguish it from errors in exception handling.
- except ValueError:
- raise ValueError(column, el, v)
+ except ValueError as e:
+ raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _extract_lst(df, column, select_function, current_filter): | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError: | |
raise ValueError(column, el, v) | |
def _extract_lst(df, column, select_function, current_filter): | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError as e: | |
raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e |
Tools
Ruff
82-82: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
atomistics/referencedata/wolfram.py
Outdated
def _get_volume(lat_lst, crystal): | ||
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | ||
if crystal == "Face-centered Cubic": | ||
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | ||
elif crystal == "Body-centered Cubic": | ||
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine the if conditions.
The if conditions for Face-centered Cubic
and Body-centered Cubic
are redundant and can be combined.
- if crystal == "Face-centered Cubic":
- return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
- elif crystal == "Body-centered Cubic":
+ if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_volume(lat_lst, crystal): | |
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | |
if crystal == "Face-centered Cubic": | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | |
elif crystal == "Body-centered Cubic": | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | |
def _get_volume(lat_lst, crystal): | |
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | |
if crystal in ["Face-centered Cubic", "Body-centered Cubic"]: | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 |
Tools
Ruff
105-108: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
I need an example to see how you imagined the interaction with the database. |
That's what the unit tests show, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Outside diff range, codebase verification and nitpick comments (1)
tests/test_referencedata.py (1)
1-3
: Remove unnecessary blank lines.There are unnecessary blank lines at the beginning of the file.
import unittest from atomistics.referencedata import get_experimental_elastic_property_wikipedia
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- atomistics/referencedata/wolfram.py (1 hunks)
- tests/test_referencedata.py (1 hunks)
Additional context used
Ruff
tests/test_referencedata.py
17-17:
lxml
imported but unused; consider usingimportlib.util.find_spec
to test for availability(F401)
atomistics/referencedata/wolfram.py
201-201: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
250-253: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
Additional comments not posted (13)
tests/test_referencedata.py (3)
29-32
: LGTM!The test method is well-written and conditionally skipped if
mendeleev
is not available.
37-39
: LGTM!The test method is well-written and conditionally skipped if
mendeleev
is not available.
44-54
: LGTM!The test method is well-written and conditionally skipped if
lxml
is not available.atomistics/referencedata/wolfram.py (10)
27-40
: LGTM!The function is straightforward and correctly handles different density units.
43-56
: LGTM!The function is straightforward and correctly handles both string and float inputs.
59-73
: LGTM!The function is straightforward and correctly handles the conversion.
76-93
: LGTM!The function is straightforward and correctly handles the conversion.
114-124
: LGTM!The function is straightforward and correctly handles the validation.
127-137
: LGTM!The function is straightforward and correctly handles the validation.
140-151
: LGTM!The function is straightforward and correctly handles the selection.
154-168
: LGTM!The function is straightforward and correctly handles the selection.
205-233
: LGTM!The function is straightforward and correctly handles the data collection and extraction.
96-111
: Add error handling for missing values.The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.
return v try: return v except ValueError: print(f"Value {n} not found in DataFrame.")Likely invalid or redundant comment.
try: | ||
from atomistics.referencedata import ( | ||
get_chemical_information_from_mendeleev, | ||
get_chemical_information_from_wolframalpha, | ||
) | ||
|
||
mendeleev_not_available = False | ||
except ImportError: | ||
mendeleev_not_available = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using importlib.util.find_spec
for conditional imports.
Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec
to check for module availability.
try:
from atomistics.referencedata import (
get_chemical_information_from_mendeleev,
get_chemical_information_from_wolframalpha,
)
mendeleev_not_available = False
except ImportError:
mendeleev_not_available = True
import importlib.util
mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None
if not mendeleev_not_available:
from atomistics.referencedata import (
get_chemical_information_from_mendeleev,
get_chemical_information_from_wolframalpha,
)
Committable suggestion was skipped due to low confidence.
try: | ||
import lxml | ||
|
||
lxml_not_available = False | ||
except ImportError: | ||
lxml_not_available = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using importlib.util.find_spec
for conditional imports.
Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec
to check for module availability.
try:
import lxml
lxml_not_available = False
except ImportError:
lxml_not_available = True
lxml_not_available = importlib.util.find_spec("lxml") is None
if not lxml_not_available:
import lxml
Committable suggestion was skipped due to low confidence.
Tools
Ruff
17-17:
lxml
imported but unused; consider usingimportlib.util.find_spec
to test for availability(F401)
atomistics/referencedata/wolfram.py
Outdated
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"atomicradius": { | ||
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"bulkmodulus": { | ||
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"shearmodulus": { | ||
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"youngmodulus": { | ||
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"poissonratio": { | ||
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html", | ||
"select_function": _select_function_poisson, | ||
"current_filter": _poisson_filter, | ||
}, | ||
"density": { | ||
"url": "https://periodictable.com/Properties/A/Density.an.html", | ||
"select_function": _select_function_density, | ||
"current_filter": _default_filter, | ||
}, | ||
"liquiddensity": { | ||
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"thermalexpansion": { | ||
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"meltingpoint": { | ||
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"vaporizationheat": { | ||
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"specificheat": { | ||
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"latticeconstant": { | ||
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html", | ||
"select_function": _select_function_lattice, | ||
"current_filter": _default_filter, | ||
}, | ||
"crystal": { | ||
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html", | ||
"select_function": _select_function_poisson, | ||
"current_filter": _default_filter, | ||
}, | ||
"volmolar": { | ||
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"mass": { | ||
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html", | ||
"select_function": _select_function_mass, | ||
"current_filter": _default_filter, | ||
}, | ||
} | ||
result = pandas.concat( | ||
[ | ||
_collect( | ||
url=v["url"], | ||
column=k, | ||
select_function=v["select_function"], | ||
current_filter=v["current_filter"], | ||
).set_index("element") | ||
for k, v in data_dict.items() | ||
], | ||
axis=1, | ||
sort=False, | ||
) | ||
result["volume"] = result.apply( | ||
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | ||
) | ||
data_path = os.path.join(os.path.dirname(__file__), "data") | ||
os.makedirs(data_path, exist_ok=True) | ||
result.to_csv(os.path.join(data_path, "wolfram.csv")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add logging for download progress and error handling for network-related errors.
The function might benefit from logging the download progress to provide feedback to the user and handling network-related errors.
import logging
logging.basicConfig(level=logging.INFO)
def _wolframalpha_download() -> None:
"""
Downloads data from Wolfram Alpha and saves it to a CSV file.
This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.
The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.
Note: This function requires internet connectivity to download the data.
"""
data_dict = {
"thermalcondictivity": {
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"atomicradius": {
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"bulkmodulus": {
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"shearmodulus": {
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"youngmodulus": {
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"poissonratio": {
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
"select_function": _select_function_poisson,
"current_filter": _poisson_filter,
},
"density": {
"url": "https://periodictable.com/Properties/A/Density.an.html",
"select_function": _select_function_density,
"current_filter": _default_filter,
},
"liquiddensity": {
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"thermalexpansion": {
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"meltingpoint": {
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
"select_function": _select_function_scientific,
"current_filter": _default_filter,
},
"vaporizationheat": {
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"specificheat": {
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
"select_function": _select_function_split,
"current_filter": _default_filter,
},
"latticeconstant": {
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
"select_function": _select_function_lattice,
"current_filter": _default_filter,
},
"crystal": {
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
"select_function": _select_function_poisson,
"current_filter": _default_filter,
},
"volmolar": {
<!-- This is an auto-generated comment by CodeRabbit -->
atomistics/referencedata/wolfram.py
Outdated
def _get_volume( | ||
lat_lst: Union[float, Tuple[float, float, float]], crystal: str | ||
) -> Optional[float]: | ||
""" | ||
Calculate the volume of a crystal unit cell. | ||
|
||
Args: | ||
lat_lst: Lattice constants of the crystal unit cell. | ||
crystal: Basic crystal lattice structure. | ||
|
||
Returns: | ||
The volume of the crystal unit cell in cubic meters, or None if the input is invalid. | ||
""" | ||
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | ||
if crystal == "Face-centered Cubic": | ||
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | ||
elif crystal == "Body-centered Cubic": | ||
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | ||
elif crystal == "Simple Hexagonal": | ||
return ( | ||
lat_lst[0] | ||
* lat_lst[1] | ||
* lat_lst[2] | ||
* 3 | ||
* np.sqrt(3) | ||
/ 2 | ||
/ 100 | ||
/ 100 | ||
/ 100 | ||
) | ||
else: | ||
return None | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine the if conditions.
The if conditions for "Face-centered Cubic" and "Body-centered Cubic" crystals are redundant and can be combined.
if crystal == "Face-centered Cubic":
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
elif crystal == "Body-centered Cubic":
if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _get_volume( | |
lat_lst: Union[float, Tuple[float, float, float]], crystal: str | |
) -> Optional[float]: | |
""" | |
Calculate the volume of a crystal unit cell. | |
Args: | |
lat_lst: Lattice constants of the crystal unit cell. | |
crystal: Basic crystal lattice structure. | |
Returns: | |
The volume of the crystal unit cell in cubic meters, or None if the input is invalid. | |
""" | |
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | |
if crystal == "Face-centered Cubic": | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | |
elif crystal == "Body-centered Cubic": | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | |
elif crystal == "Simple Hexagonal": | |
return ( | |
lat_lst[0] | |
* lat_lst[1] | |
* lat_lst[2] | |
* 3 | |
* np.sqrt(3) | |
/ 2 | |
/ 100 | |
/ 100 | |
/ 100 | |
) | |
else: | |
return None | |
else: | |
return None | |
if not isinstance(lat_lst, float) and len(lat_lst) == 3: | |
if crystal in ["Face-centered Cubic", "Body-centered Cubic"]: | |
return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 | |
elif crystal == "Simple Hexagonal": | |
return ( | |
lat_lst[0] | |
* lat_lst[1] | |
* lat_lst[2] | |
* 3 | |
* np.sqrt(3) | |
/ 2 | |
/ 100 | |
/ 100 | |
/ 100 | |
) | |
else: | |
return None | |
else: | |
return None |
Tools
Ruff
250-253: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
atomistics/referencedata/wolfram.py
Outdated
def _get_content_from_url(url: str) -> pandas.DataFrame: | ||
""" | ||
Retrieves content from the given URL and returns a pandas DataFrame. | ||
|
||
Args: | ||
url (str): The URL to retrieve content from. | ||
|
||
Returns: | ||
pandas.DataFrame: The content retrieved from the URL as a pandas DataFrame. | ||
""" | ||
content = pandas.read_html(requests.get(url).text) | ||
if len(content[8]) > 1: | ||
return content[8] | ||
else: | ||
return content[9] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for network-related errors.
The function does not handle network-related errors, which might cause it to fail silently.
content = pandas.read_html(requests.get(url).text)
try:
response = requests.get(url)
response.raise_for_status()
content = pandas.read_html(response.text)
except requests.RequestException as e:
print(f"Network error occurred: {e}")
Committable suggestion was skipped due to low confidence.
atomistics/referencedata/wolfram.py
Outdated
def _extract_lst( | ||
df: pandas.DataFrame, | ||
column: str, | ||
select_function: Callable, | ||
current_filter: Callable, | ||
) -> Tuple[List[str], List[str]]: | ||
""" | ||
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | ||
|
||
Args: | ||
df (DataFrame): The DataFrame to extract data from. | ||
column (str): The column name to extract data from. | ||
select_function (Callable): A function that selects a property value from a DataFrame entry. | ||
current_filter (Callable): A function that filters DataFrame entries based on a condition. | ||
|
||
Returns: | ||
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | ||
|
||
Raises: | ||
ValueError: If an error occurs during the extraction process. | ||
""" | ||
element_lst, property_lst = [], [] | ||
try: | ||
ptable = fetch_table("elements") | ||
for n, el in zip(ptable.name.values, ptable.symbol.values): | ||
v = _extract_entry(df=df, n=n) | ||
if current_filter(v): | ||
element_lst.append(el) | ||
property_lst.append(select_function(v)) | ||
except ValueError: | ||
raise ValueError(column, el, v) | ||
return element_lst, property_lst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling by providing more context.
The function raises a ValueError
without providing context about the error. Use raise ... from None
to distinguish it from errors in exception handling.
raise ValueError(column, el, v)
raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _extract_lst( | |
df: pandas.DataFrame, | |
column: str, | |
select_function: Callable, | |
current_filter: Callable, | |
) -> Tuple[List[str], List[str]]: | |
""" | |
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | |
Args: | |
df (DataFrame): The DataFrame to extract data from. | |
column (str): The column name to extract data from. | |
select_function (Callable): A function that selects a property value from a DataFrame entry. | |
current_filter (Callable): A function that filters DataFrame entries based on a condition. | |
Returns: | |
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | |
Raises: | |
ValueError: If an error occurs during the extraction process. | |
""" | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError: | |
raise ValueError(column, el, v) | |
return element_lst, property_lst | |
def _extract_lst( | |
df: pandas.DataFrame, | |
column: str, | |
select_function: Callable, | |
current_filter: Callable, | |
) -> Tuple[List[str], List[str]]: | |
""" | |
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | |
Args: | |
df (DataFrame): The DataFrame to extract data from. | |
column (str): The column name to extract data from. | |
select_function (Callable): A function that selects a property value from a DataFrame entry. | |
current_filter (Callable): A function that filters DataFrame entries based on a condition. | |
Returns: | |
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | |
Raises: | |
ValueError: If an error occurs during the extraction process. | |
""" | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError as e: | |
raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e | |
return element_lst, property_lst |
Tools
Ruff
201-201: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
for more information, see https://pre-commit.ci
Merge main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names must be consistent: currently get_{property}_from_{source}
for Wolfram and Mendeleev but get_{property}_{source}
for Wikipedia. And I would also avoid putting specifiers, in this case experimental
, in the function names. There's anyway no way to get theoretical elastic constants, but in general I would put experimental yes/no etc. in the function arguments.
Actually I would simply call these functions something like from referencedata import wikipedia
elast = wikipedia.get_elastic_property("Al") because this way the hierarchy becomes clearer, it avoids having to import too many functions and the function names are shorter. |
…y_wikipedia() to get_elastic_properties_from_wikipedia()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
tests/test_referencedata.py (2)
5-14
: Consider usingimportlib.util.find_spec
for conditional imports.Instead of using try-except blocks for conditional imports, consider using
importlib.util.find_spec
to check for module availability.- try: - from atomistics.referencedata import ( - get_chemical_information_from_mendeleev, - get_chemical_information_from_wolframalpha, - ) - mendeleev_not_available = False - except ImportError: - mendeleev_not_available = True + import importlib.util + mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None + if not mendeleev_not_available: + from atomistics.referencedata import ( + get_chemical_information_from_mendeleev, + get_chemical_information_from_wolframalpha, + )
16-21
: Remove unused importlxml
.The
lxml
module is imported but not used in the code. Consider usingimportlib.util.find_spec
to check for module availability instead.- try: - import lxml - lxml_not_available = False - except ImportError: - lxml_not_available = True + import importlib.util + lxml_not_available = importlib.util.find_spec("lxml") is NoneTools
Ruff
17-17:
lxml
imported but unused; consider usingimportlib.util.find_spec
to test for availability(F401)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- atomistics/referencedata/init.py (1 hunks)
- atomistics/referencedata/wiki.py (1 hunks)
- atomistics/referencedata/wolfram.py (1 hunks)
- tests/test_referencedata.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- atomistics/referencedata/init.py
- atomistics/referencedata/wiki.py
- atomistics/referencedata/wolfram.py
Additional context used
Ruff
tests/test_referencedata.py
17-17:
lxml
imported but unused; consider usingimportlib.util.find_spec
to test for availability(F401)
Additional comments not posted (6)
tests/test_referencedata.py (6)
24-28
: LGTM!The
unittest.skipIf
condition is correctly implemented to skip the test ifmendeleev
is not available.
29-32
: LGTM!The test method correctly checks if the atomic number of Aluminum is 13.
33-36
: LGTM!The
unittest.skipIf
condition is correctly implemented to skip the test ifmendeleev
is not available.
37-40
: LGTM!The test method correctly checks if the lattice constant of Aluminum is retrieved as expected.
41-43
: LGTM!The
unittest.skipIf
condition is correctly implemented to skip the test iflxml
is not available.
44-54
: LGTM!The test method correctly checks if the elastic properties of Aluminum are retrieved as expected.
from atomistics.referencedata import get_elastic_properties_from_wikipedia as well as from atomistics.referencedata.wikipedia import get_elastic_properties
for more information, see https://pre-commit.ci
@samwaseda Support both: from atomistics.referencedata import get_elastic_properties_from_wikipedia as well as from atomistics.referencedata.wikipedia import get_elastic_properties I like the first one, as it allows me to get a quick overview of which modules are available but I can also live with the second one in particular for properties which are available from multiple sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for the rest we have to see how things evolve. I don't think we'll suffer from backward incompatibility in the near future, so you can merge it
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | ||
|
||
Returns: | ||
str: Property value (various types): Value of the property for the given element, if available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str: Property value (various types): Value of the property for the given element, if available. | |
str: Property value (various types): Value of the property for the given element in GPa, if available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Outside diff range, codebase verification and nitpick comments (2)
atomistics/referencedata/mendeleev.py (1)
4-112
: Specify the return type in the docstring.The docstring should specify that the function returns a dictionary.
- dict: Dictionary with the following keys + dict: A dictionary containing the following keysatomistics/referencedata/wolframalpha.py (1)
384-414
: Specify the return type in the docstring.The docstring should specify that the function returns a dictionary.
- dict: Dictionary with the following keys + dict: A dictionary containing the following keys
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- atomistics/referencedata/init.py (1 hunks)
- atomistics/referencedata/mendeleev.py (1 hunks)
- atomistics/referencedata/wikipedia.py (1 hunks)
- atomistics/referencedata/wolframalpha.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- atomistics/referencedata/init.py
Additional comments not posted (2)
atomistics/referencedata/wikipedia.py (1)
25-29
: Verify table order assumption and handle potential changes.The function assumes that the order of tables in the Wikipedia page will not change, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.
+ if len(df_lst) < len(property_lst): + raise ValueError("The number of tables on the Wikipedia page is less than expected.")atomistics/referencedata/wolframalpha.py (1)
21-24
: Verify table order assumption and handle potential changes.The function assumes that the content will always be in the 8th or 9th table, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.
+ if len(content) < 9: + raise ValueError("The number of tables on the page is less than expected.")
df_lst = pandas.read_html( | ||
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential errors during data retrieval.
The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.
+ try:
+ df_lst = pandas.read_html(
+ "https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
+ )
+ except Exception as e:
+ raise ValueError(f"Error retrieving data from Wikipedia: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
df_lst = pandas.read_html( | |
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)" | |
) | |
try: | |
df_lst = pandas.read_html( | |
"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)" | |
) | |
except Exception as e: | |
raise ValueError(f"Error retrieving data from Wikipedia: {e}") |
def get_elastic_properties(chemical_symbol: str) -> dict: | ||
""" | ||
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | ||
|
||
Args: | ||
chemical_symbol (str): Chemical symbol of the element. | ||
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | ||
|
||
Returns: | ||
str: Property value (various types): Value of the property for the given element, if available. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the docstring to match the function signature.
The docstring mentions an argument property
which is not present in the function signature. Remove the mention of the property
argument from the docstring.
- property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_elastic_properties(chemical_symbol: str) -> dict: | |
""" | |
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | |
Args: | |
chemical_symbol (str): Chemical symbol of the element. | |
property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus | |
Returns: | |
str: Property value (various types): Value of the property for the given element, if available. | |
""" | |
def get_elastic_properties(chemical_symbol: str) -> dict: | |
""" | |
Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com. | |
Args: | |
chemical_symbol (str): Chemical symbol of the element. | |
Returns: | |
str: Property value (various types): Value of the property for the given element, if available. | |
""" |
df = fetch_table("elements") | ||
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle cases where the chemical symbol is not found.
The function does not handle cases where the chemical symbol is not found in the table. Add error handling to manage this scenario.
+ element_data = df[df.symbol == chemical_symbol]
+ if element_data.empty:
+ raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.")
+ return element_data.squeeze(axis=0).to_dict()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
df = fetch_table("elements") | |
return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() | |
df = fetch_table("elements") | |
element_data = df[df.symbol == chemical_symbol] | |
if element_data.empty: | |
raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.") | |
return element_data.squeeze(axis=0).to_dict() |
def _select_function_scientific(v: Union[str, float]) -> float: | ||
""" | ||
Converts a scientific notation string to a float. | ||
|
||
Args: | ||
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation. | ||
|
||
Returns: | ||
float: The converted value. | ||
|
||
Example: | ||
>>> _select_function_scientific('1.23E-4') | ||
0.000123 | ||
""" | ||
if isinstance(v, str): | ||
return float(v.split()[0].replace("×10", "E")) | ||
else: | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unexpected input formats.
The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.
+ try:
+ if isinstance(v, str):
+ return float(v.split()[0].replace("×10", "E"))
+ else:
+ return v
+ except (ValueError, IndexError) as e:
+ raise ValueError(f"Unexpected input format for scientific value '{v}': {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _select_function_scientific(v: Union[str, float]) -> float: | |
""" | |
Converts a scientific notation string to a float. | |
Args: | |
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation. | |
Returns: | |
float: The converted value. | |
Example: | |
>>> _select_function_scientific('1.23E-4') | |
0.000123 | |
""" | |
if isinstance(v, str): | |
return float(v.split()[0].replace("×10", "E")) | |
else: | |
return v | |
def _select_function_scientific(v: Union[str, float]) -> float: | |
""" | |
Converts a scientific notation string to a float. | |
Args: | |
v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation. | |
Returns: | |
float: The converted value. | |
Example: | |
>>> _select_function_scientific('1.23E-4') | |
0.000123 | |
""" | |
try: | |
if isinstance(v, str): | |
return float(v.split()[0].replace("×10", "E")) | |
else: | |
return v | |
except (ValueError, IndexError) as e: | |
raise ValueError(f"Unexpected input format for scientific value '{v}': {e}") |
def _select_function_lattice(v: str) -> tuple[float, float, float]: | ||
""" | ||
Convert a string representation of a lattice vector to a tuple of floats. | ||
|
||
Args: | ||
v (str): The string representation of the lattice vector. | ||
|
||
Returns: | ||
tuple[float, float, float]: The lattice vector as a tuple of floats. | ||
""" | ||
return ( | ||
float(v.split(", ")[0]), | ||
float(v.split(", ")[1]), | ||
float(v.split(", ")[2].split()[0]), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unexpected input formats.
The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.
+ try:
+ return (
+ float(v.split(", ")[0]),
+ float(v.split(", ")[1]),
+ float(v.split(", ")[2].split()[0]),
+ )
+ except (ValueError, IndexError) as e:
+ raise ValueError(f"Unexpected input format for lattice value '{v}': {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _select_function_lattice(v: str) -> tuple[float, float, float]: | |
""" | |
Convert a string representation of a lattice vector to a tuple of floats. | |
Args: | |
v (str): The string representation of the lattice vector. | |
Returns: | |
tuple[float, float, float]: The lattice vector as a tuple of floats. | |
""" | |
return ( | |
float(v.split(", ")[0]), | |
float(v.split(", ")[1]), | |
float(v.split(", ")[2].split()[0]), | |
) | |
def _select_function_lattice(v: str) -> tuple[float, float, float]: | |
""" | |
Convert a string representation of a lattice vector to a tuple of floats. | |
Args: | |
v (str): The string representation of the lattice vector. | |
Returns: | |
tuple[float, float, float]: The lattice vector as a tuple of floats. | |
""" | |
try: | |
return ( | |
float(v.split(", ")[0]), | |
float(v.split(", ")[1]), | |
float(v.split(", ")[2].split()[0]), | |
) | |
except (ValueError, IndexError) as e: | |
raise ValueError(f"Unexpected input format for lattice value '{v}': {e}") |
def _select_function_split(v: Union[str, float]) -> float: | ||
""" | ||
Splits a string and returns the first element as a float. | ||
|
||
Args: | ||
v (Union[str, float]): The input value to be processed. | ||
|
||
Returns: | ||
float: The first element of the input string as a float, or the input value if it's already a float. | ||
""" | ||
if isinstance(v, str): | ||
return float(v.split()[0]) | ||
else: | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unexpected input formats.
The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.
+ try:
+ if isinstance(v, str):
+ return float(v.split()[0])
+ else:
+ return v
+ except (ValueError, IndexError) as e:
+ raise ValueError(f"Unexpected input format for value '{v}': {e}")
Committable suggestion was skipped due to low confidence.
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]: | ||
""" | ||
Extracts the value corresponding to the given element name from a DataFrame. | ||
|
||
Args: | ||
df (pandas.DataFrame): The DataFrame to extract data from. | ||
n (str): The element name to extract data for. | ||
|
||
Returns: | ||
Optional[Union[str, float]]: The extracted value if found, None otherwise. | ||
""" | ||
for i in range(int(len(df.columns) / 2)): | ||
if n in df[i].values: | ||
v = df[i + 1].values[df[i].values.tolist().index(n)] | ||
return v | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle cases where the element name is not found.
The function does not handle cases where the element name is not found in the DataFrame. Add error handling to manage this scenario.
+ for i in range(int(len(df.columns) / 2)):
+ if n in df[i].values:
+ v = df[i + 1].values[df[i].values.tolist().index(n)]
+ return v
+ raise ValueError(f"Element name '{n}' not found in the DataFrame.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]: | |
""" | |
Extracts the value corresponding to the given element name from a DataFrame. | |
Args: | |
df (pandas.DataFrame): The DataFrame to extract data from. | |
n (str): The element name to extract data for. | |
Returns: | |
Optional[Union[str, float]]: The extracted value if found, None otherwise. | |
""" | |
for i in range(int(len(df.columns) / 2)): | |
if n in df[i].values: | |
v = df[i + 1].values[df[i].values.tolist().index(n)] | |
return v | |
return None | |
def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]: | |
""" | |
Extracts the value corresponding to the given element name from a DataFrame. | |
Args: | |
df (pandas.DataFrame): The DataFrame to extract data from. | |
n (str): The element name to extract data for. | |
Returns: | |
Optional[Union[str, float]]: The extracted value if found, None otherwise. | |
""" | |
for i in range(int(len(df.columns) / 2)): | |
if n in df[i].values: | |
v = df[i + 1].values[df[i].values.tolist().index(n)] | |
return v | |
raise ValueError(f"Element name '{n}' not found in the DataFrame.") |
def _collect( | ||
url: str, column: str, select_function: callable, current_filter: callable | ||
) -> pandas.DataFrame: | ||
""" | ||
Collect data from a given URL and extract specific properties. | ||
|
||
Args: | ||
url: The URL to fetch the data from. | ||
column: The name of the column to extract. | ||
select_function: The function to apply to the extracted data. | ||
current_filter: The filter function to apply to the extracted data. | ||
|
||
Returns: | ||
A pandas DataFrame containing the extracted data. | ||
""" | ||
return pandas.DataFrame( | ||
{ | ||
n: d | ||
for n, d in zip( | ||
["element", column], | ||
_extract_lst( | ||
df=_get_content_from_url(url=url), | ||
column=column, | ||
select_function=select_function, | ||
current_filter=current_filter, | ||
), | ||
) | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential errors during data retrieval.
The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.
+ try:
+ data = _extract_lst(
+ df=_get_content_from_url(url=url),
+ column=column,
+ select_function=select_function,
+ current_filter=current_filter,
+ )
+ except Exception as e:
+ raise ValueError(f"Error collecting data from URL '{url}': {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _collect( | |
url: str, column: str, select_function: callable, current_filter: callable | |
) -> pandas.DataFrame: | |
""" | |
Collect data from a given URL and extract specific properties. | |
Args: | |
url: The URL to fetch the data from. | |
column: The name of the column to extract. | |
select_function: The function to apply to the extracted data. | |
current_filter: The filter function to apply to the extracted data. | |
Returns: | |
A pandas DataFrame containing the extracted data. | |
""" | |
return pandas.DataFrame( | |
{ | |
n: d | |
for n, d in zip( | |
["element", column], | |
_extract_lst( | |
df=_get_content_from_url(url=url), | |
column=column, | |
select_function=select_function, | |
current_filter=current_filter, | |
), | |
) | |
} | |
) | |
def _collect( | |
url: str, column: str, select_function: callable, current_filter: callable | |
) -> pandas.DataFrame: | |
""" | |
Collect data from a given URL and extract specific properties. | |
Args: | |
url: The URL to fetch the data from. | |
column: The name of the column to extract. | |
select_function: The function to apply to the extracted data. | |
current_filter: The filter function to apply to the extracted data. | |
Returns: | |
A pandas DataFrame containing the extracted data. | |
""" | |
try: | |
data = _extract_lst( | |
df=_get_content_from_url(url=url), | |
column=column, | |
select_function=select_function, | |
current_filter=current_filter, | |
) | |
except Exception as e: | |
raise ValueError(f"Error collecting data from URL '{url}': {e}") | |
return pandas.DataFrame( | |
{ | |
n: d | |
for n, d in zip( | |
["element", column], | |
data, | |
) | |
} | |
) |
def _extract_lst( | ||
df: pandas.DataFrame, | ||
column: str, | ||
select_function: Callable, | ||
current_filter: Callable, | ||
) -> Tuple[List[str], List[str]]: | ||
""" | ||
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | ||
|
||
Args: | ||
df (DataFrame): The DataFrame to extract data from. | ||
column (str): The column name to extract data from. | ||
select_function (Callable): A function that selects a property value from a DataFrame entry. | ||
current_filter (Callable): A function that filters DataFrame entries based on a condition. | ||
|
||
Returns: | ||
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | ||
|
||
Raises: | ||
ValueError: If an error occurs during the extraction process. | ||
""" | ||
element_lst, property_lst = [], [] | ||
try: | ||
ptable = fetch_table("elements") | ||
for n, el in zip(ptable.name.values, ptable.symbol.values): | ||
v = _extract_entry(df=df, n=n) | ||
if current_filter(v): | ||
element_lst.append(el) | ||
property_lst.append(select_function(v)) | ||
except ValueError as e: | ||
raise ValueError( | ||
f"Error extracting {column} for element {el} with value {v}" | ||
) from e | ||
return element_lst, property_lst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle cases where the DataFrame is empty or the column is not found.
The function does not handle cases where the DataFrame is empty or the column is not found. Add error handling to manage these scenarios.
+ if df.empty:
+ raise ValueError("The DataFrame is empty.")
+ if column not in df.columns:
+ raise ValueError(f"Column '{column}' not found in the DataFrame.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _extract_lst( | |
df: pandas.DataFrame, | |
column: str, | |
select_function: Callable, | |
current_filter: Callable, | |
) -> Tuple[List[str], List[str]]: | |
""" | |
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | |
Args: | |
df (DataFrame): The DataFrame to extract data from. | |
column (str): The column name to extract data from. | |
select_function (Callable): A function that selects a property value from a DataFrame entry. | |
current_filter (Callable): A function that filters DataFrame entries based on a condition. | |
Returns: | |
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | |
Raises: | |
ValueError: If an error occurs during the extraction process. | |
""" | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError as e: | |
raise ValueError( | |
f"Error extracting {column} for element {el} with value {v}" | |
) from e | |
return element_lst, property_lst | |
def _extract_lst( | |
df: pandas.DataFrame, | |
column: str, | |
select_function: Callable, | |
current_filter: Callable, | |
) -> Tuple[List[str], List[str]]: | |
""" | |
Extracts element and property lists from a DataFrame based on the given column, select function, and current filter. | |
Args: | |
df (DataFrame): The DataFrame to extract data from. | |
column (str): The column name to extract data from. | |
select_function (Callable): A function that selects a property value from a DataFrame entry. | |
current_filter (Callable): A function that filters DataFrame entries based on a condition. | |
Returns: | |
Tuple[List[str], List[str]]: A tuple containing the element list and property list. | |
Raises: | |
ValueError: If an error occurs during the extraction process. | |
""" | |
if df.empty: | |
raise ValueError("The DataFrame is empty.") | |
if column not in df.columns: | |
raise ValueError(f"Column '{column}' not found in the DataFrame.") | |
element_lst, property_lst = [], [] | |
try: | |
ptable = fetch_table("elements") | |
for n, el in zip(ptable.name.values, ptable.symbol.values): | |
v = _extract_entry(df=df, n=n) | |
if current_filter(v): | |
element_lst.append(el) | |
property_lst.append(select_function(v)) | |
except ValueError as e: | |
raise ValueError( | |
f"Error extracting {column} for element {el} with value {v}" | |
) from e | |
return element_lst, property_lst |
"thermalcondictivity": { | ||
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"atomicradius": { | ||
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"bulkmodulus": { | ||
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"shearmodulus": { | ||
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"youngmodulus": { | ||
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"poissonratio": { | ||
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html", | ||
"select_function": _select_function_poisson, | ||
"current_filter": _poisson_filter, | ||
}, | ||
"density": { | ||
"url": "https://periodictable.com/Properties/A/Density.an.html", | ||
"select_function": _select_function_density, | ||
"current_filter": _default_filter, | ||
}, | ||
"liquiddensity": { | ||
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"thermalexpansion": { | ||
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"meltingpoint": { | ||
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"vaporizationheat": { | ||
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"specificheat": { | ||
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html", | ||
"select_function": _select_function_split, | ||
"current_filter": _default_filter, | ||
}, | ||
"latticeconstant": { | ||
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html", | ||
"select_function": _select_function_lattice, | ||
"current_filter": _default_filter, | ||
}, | ||
"crystal": { | ||
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html", | ||
"select_function": _select_function_poisson, | ||
"current_filter": _default_filter, | ||
}, | ||
"volmolar": { | ||
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html", | ||
"select_function": _select_function_scientific, | ||
"current_filter": _default_filter, | ||
}, | ||
"mass": { | ||
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html", | ||
"select_function": _select_function_mass, | ||
"current_filter": _default_filter, | ||
}, | ||
} | ||
result = pandas.concat( | ||
[ | ||
_collect( | ||
url=v["url"], | ||
column=k, | ||
select_function=v["select_function"], | ||
current_filter=v["current_filter"], | ||
).set_index("element") | ||
for k, v in data_dict.items() | ||
], | ||
axis=1, | ||
sort=False, | ||
) | ||
result["volume"] = result.apply( | ||
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | ||
) | ||
data_path = os.path.join(os.path.dirname(__file__), "data") | ||
os.makedirs(data_path, exist_ok=True) | ||
result.to_csv(os.path.join(data_path, "wolfram.csv")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential errors during data retrieval and file writing.
The function does not handle potential errors that may occur during the data retrieval and file writing process. Add error handling to manage these scenarios.
+ try:
+ result = pandas.concat(
+ [
+ _collect(
+ url=v["url"],
+ column=k,
+ select_function=v["select_function"],
+ current_filter=v["current_filter"],
+ ).set_index("element")
+ for k, v in data_dict.items()
+ ],
+ axis=1,
+ sort=False,
+ )
+ result["volume"] = result.apply(
+ lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+ )
+ data_path = os.path.join(os.path.dirname(__file__), "data")
+ os.makedirs(data_path, exist_ok=True)
+ result.to_csv(os.path.join(data_path, "wolfram.csv"))
+ except Exception as e:
+ raise ValueError(f"Error downloading data from Wolfram Alpha: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _wolframalpha_download() -> None: | |
""" | |
Downloads data from Wolfram Alpha and saves it to a CSV file. | |
This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`. | |
The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory. | |
Note: This function requires internet connectivity to download the data. | |
""" | |
data_dict = { | |
"thermalcondictivity": { | |
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"atomicradius": { | |
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"bulkmodulus": { | |
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"shearmodulus": { | |
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"youngmodulus": { | |
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"poissonratio": { | |
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html", | |
"select_function": _select_function_poisson, | |
"current_filter": _poisson_filter, | |
}, | |
"density": { | |
"url": "https://periodictable.com/Properties/A/Density.an.html", | |
"select_function": _select_function_density, | |
"current_filter": _default_filter, | |
}, | |
"liquiddensity": { | |
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"thermalexpansion": { | |
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"meltingpoint": { | |
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"vaporizationheat": { | |
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"specificheat": { | |
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"latticeconstant": { | |
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html", | |
"select_function": _select_function_lattice, | |
"current_filter": _default_filter, | |
}, | |
"crystal": { | |
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html", | |
"select_function": _select_function_poisson, | |
"current_filter": _default_filter, | |
}, | |
"volmolar": { | |
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"mass": { | |
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html", | |
"select_function": _select_function_mass, | |
"current_filter": _default_filter, | |
}, | |
} | |
result = pandas.concat( | |
[ | |
_collect( | |
url=v["url"], | |
column=k, | |
select_function=v["select_function"], | |
current_filter=v["current_filter"], | |
).set_index("element") | |
for k, v in data_dict.items() | |
], | |
axis=1, | |
sort=False, | |
) | |
result["volume"] = result.apply( | |
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | |
) | |
data_path = os.path.join(os.path.dirname(__file__), "data") | |
os.makedirs(data_path, exist_ok=True) | |
result.to_csv(os.path.join(data_path, "wolfram.csv")) | |
def _wolframalpha_download() -> None: | |
""" | |
Downloads data from Wolfram Alpha and saves it to a CSV file. | |
This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`. | |
The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory. | |
Note: This function requires internet connectivity to download the data. | |
""" | |
data_dict = { | |
"thermalcondictivity": { | |
"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"atomicradius": { | |
"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"bulkmodulus": { | |
"url": "https://periodictable.com/Properties/A/BulkModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"shearmodulus": { | |
"url": "https://periodictable.com/Properties/A/ShearModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"youngmodulus": { | |
"url": "https://periodictable.com/Properties/A/YoungModulus.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"poissonratio": { | |
"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html", | |
"select_function": _select_function_poisson, | |
"current_filter": _poisson_filter, | |
}, | |
"density": { | |
"url": "https://periodictable.com/Properties/A/Density.an.html", | |
"select_function": _select_function_density, | |
"current_filter": _default_filter, | |
}, | |
"liquiddensity": { | |
"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"thermalexpansion": { | |
"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"meltingpoint": { | |
"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"vaporizationheat": { | |
"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"specificheat": { | |
"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html", | |
"select_function": _select_function_split, | |
"current_filter": _default_filter, | |
}, | |
"latticeconstant": { | |
"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html", | |
"select_function": _select_function_lattice, | |
"current_filter": _default_filter, | |
}, | |
"crystal": { | |
"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html", | |
"select_function": _select_function_poisson, | |
"current_filter": _default_filter, | |
}, | |
"volmolar": { | |
"url": "https://periodictable.com/Properties/A/MolarVolume.an.html", | |
"select_function": _select_function_scientific, | |
"current_filter": _default_filter, | |
}, | |
"mass": { | |
"url": "https://periodictable.com/Properties/A/AtomicMass.an.html", | |
"select_function": _select_function_mass, | |
"current_filter": _default_filter, | |
}, | |
} | |
try: | |
result = pandas.concat( | |
[ | |
_collect( | |
url=v["url"], | |
column=k, | |
select_function=v["select_function"], | |
current_filter=v["current_filter"], | |
).set_index("element") | |
for k, v in data_dict.items() | |
], | |
axis=1, | |
sort=False, | |
) | |
result["volume"] = result.apply( | |
lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 | |
) | |
data_path = os.path.join(os.path.dirname(__file__), "data") | |
os.makedirs(data_path, exist_ok=True) | |
result.to_csv(os.path.join(data_path, "wolfram.csv")) | |
except Exception as e: | |
raise ValueError(f"Error downloading data from Wolfram Alpha: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- atomistics/referencedata/wikipedia.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- atomistics/referencedata/wikipedia.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- .ci_support/environment-old.yml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .ci_support/environment-old.yml
Summary by CodeRabbit
These enhancements provide users with a structured and efficient way to access extensive chemical data, benefiting educational and research purposes.